This document takes the various outputs produced by salmon/hisat/htseq/etc and generates the data which will be used in all of the following analyses.
Periodically, the shared metadata sheet was downloaded locally to the sample_sheets/ directory, given a suffix corresponding to the current worksheet revision, and the various trimming/alignment/etc statistics were manually added. Finally, some demographics were appended to the sample metadata. The resulting modified metadata was used as the input for all following analyses.
samplesheet <- "sample_sheets/tmrc3_samples.xlsx"
testthat::expect_true(file.exists(samplesheet))
demographics <- "sample_sheets/tmrc3_demographicsv2.xlsx"
hslp_samplesheet <- "sample_sheets/tmrc3_samples_pruned.xlsx"
species_identities <- "sample_sheets/identified_parasite_species.xlsx"
data_structures <- c(data_structures, "samplesheet", "demographics")The primary annotation sources are:
Both provide GO data. They also provide helpful links to other data sources. For the moment, we are focusing on the human annotations. In the first instance, the annotations are acquired via some functions in hpgltools which seek to make biomaRt (Smedley et al. (2009)) a little more robust. In the second instance, the annotations are extracted from a locally generated orgDB/annotationDbi (“AnnotationDbi” (n.d.)) instance.
These analyses have focused on gene-level abundances/differences. Thus, when htseq-count was invoked against the hisat2-based mappings, parameters were chosen to count genes rather than transcripts. In this context, a gene refers to the non-redundant union of the transcripts’ exons. Similarly, when salmon counts were used via tximport, a mapping of genes to transcripts was used to collapse the matrix to gene-level abundances. This decision may be revisited.
The parasite annotations were downloaded from the tritrypdb via the provided REST interface, exported into an OrgDB instance, and extracted via the EuPathDB package.
hs_annot <- load_biomart_annotations(year = "2020", month = "jan")## Using mart: ENSEMBL_MART_ENSEMBL from host: jan2020.archive.ensembl.org.
## Successfully connected to the hsapiens_gene_ensembl database.
## Finished downloading ensembl gene annotations.
## Finished downloading ensembl structure annotations.
## symbol columns is null, pattern matching 'symbol'.
## Including symbols, there are 68435 vs the 249606 gene annotations.
## Not dropping haplotype chromosome annotations, set drop_haplotypes = TRUE if this is bad.
## Saving annotations to hsapiens_biomart_annotations.rda.
## Finished save().
hs_annot <- hs_annot[["annotation"]]
## The next two lines make a bridge between the gff file used by hisat2
## and the gene IDs.The following lines may be used if one wishes to use the relatively strict ID.version system (optionally) enforced by salmon. It should be noted that newer versions of tximport have options which ignore this, and I also now have a function which will set the ID.version for the gene annotations to be the same as what was observed in the count tables. Either of those strategies makes the following not necessary; but if you wish to be a little bit pedantic about the gene<->Tx IDs, then this should give you a sense of how well matched are the downloaded annotations and gene IDs used by salmon.
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
hs_annot <- group_mean_cds_length(hs_annot)
hs_tx_annot <- hs_annot ## Make a copy so I don't lose transcript IDs for salmon
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique = TRUE)
not_unique_idx <- grepl(x = rownames(hs_annot), pattern = "\\.\\d+$")
hs_annot <- hs_annot[!not_unique_idx, ]The following block will stop working soon. Its purpose is to download the current (or a specific version) annotations from one of the eupathdb.org websites. These are being retired/moved soon. My version of the EuPathDB package creates installable orgdb packages; so I presumably will just need to include the appropriate tarballs in the recipe for this container. In response to the retirement of eupathdb, I spun up an instance of my eupathdb package to download everything; hopefully it will finish before the various servers go down. I was too late for schistodb, it is already offline (I am pretty sure I have an older copy though).
Given that I recently included the parasite transcripts to the data, I should also spend a moment and figure out how to include the annotations.
As a starting point, I will just grab the EuPathDB panamensis annotations. I think I will need to limit them to just the shared columns and drop the rest.
meta <- sm(download_eupath_metadata(webservice = "tritrypdb"))
panamensis_entry <- get_eupath_entry("MHOM", metadata = meta[["valid"]])
panamensis_db <- make_eupath_orgdb(panamensis_entry)
panamensis_pkg <- panamensis_db[["pkgname"]]
package_name <- panamensis_db[["pkgname"]]
if (is.null(panamensis_pkg)) {
panamensis_pkg <- panamensis_db[["orgdb_name"]]
package_name <- panamensis_pkg
}
tt <- library(panamensis_pkg, character.only = TRUE)
panamensis_pkg <- get0(panamensis_pkg)
all_fields <- columns(panamensis_pkg)
all_lp_annot <- sm(load_orgdb_annotations(
panamensis_pkg,
keytype = "gid",
fields = c("annot_gene_entrez_id", "annot_gene_name",
"annot_strand", "annot_chromosome", "annot_cds_length",
"annot_gene_location_text", "annot_gene_product")))$genesOk, I added the orgdb tarball and an entry to install it to my bootstrap for this container. So let us load the annotations for our Leishmanial strain of interest!
package_name <- "org.Lpanamensis.MHOMCOL81L13.v68.eg.db"
tt <- library(package_name, character.only = TRUE)## Loading required package: AnnotationDbi
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
##
panamensis_pkg <- get0(package_name)
all_fields <- columns(panamensis_pkg)
all_lp_annot <- sm(load_orgdb_annotations(
package_name,
keytype = "gid",
fields = c("annot_gene_entrez_id", "annot_gene_name",
"annot_strand", "annot_chromosome", "annot_cds_length",
"annot_gene_location_text", "annot_gene_product")))[["genes"]]The following was added to support the possibility of making a single expressionset using the mappings against a genome of concatenated hg38 and lpanamensis sequence. In order to do that, one must have an annotation set with consistent columns across all genes from both species.
I am not sure if I copied those count tables to the container (I suspect I did not), but I will leave the code here as an example of how one might make a concatenated expressionset/summarizedexperiment.
all_lp_annot[["end"]] <- gsub(x = all_lp_annot[["annot_gene_location_text"]],
pattern = "^.*\\.\\.", replacement = "")
all_lp_annot[["end"]] <- gsub(x = all_lp_annot[["end"]],
pattern = "\\(.*$", replacement = "")
all_lp_annot[["end"]] <- gsub(x = all_lp_annot[["end"]],
pattern = "\\,", replacement = "")
all_lp_annot[["start"]] <- gsub(x = all_lp_annot[["annot_gene_location_text"]],
pattern = "^.*:", replacement = "")
all_lp_annot[["start"]] <- gsub(x = all_lp_annot[["start"]],
pattern = "\\.\\..*$", replacement = "")
all_lp_annot[["start"]] <- gsub(x = all_lp_annot[["start"]],
pattern = "\\,", replacement = "")
all_lp_annot[["mean_cds_len"]] <- all_lp_annot[["annot_cds_length"]]
all_lp_annot[["version"]] <- 1
all_lp_annot[["transcript_version"]] <- 1
all_lp_annot[["hgnc_symbol"]] <- ""
all_lp_annot[["uniprot_gn_symbol"]] <- ""
all_lp_annot[["transcript"]] <- all_lp_annot[["gid"]]
all_lp_annot[["ensembl_transcript_id"]] <- all_lp_annot[["gid"]]
colnames(hs_tx_annot)## [1] "ensembl_gene_id" "ensembl_transcript_id" "version"
## [4] "transcript_version" "description" "gene_biotype"
## [7] "cds_length" "chromosome_name" "strand"
## [10] "start_position" "end_position" "hgnc_symbol"
## [13] "uniprot_gn_symbol" "transcript" "mean_cds_len"
colnames(all_lp_annot)## [1] "gid" "annot_external_db_name"
## [3] "gene_type" "annot_gene_entrez_id"
## [5] "annot_gene_name" "annot_strand"
## [7] "annot_chromosome" "annot_cds_length"
## [9] "annot_gene_location_text" "annot_gene_product"
## [11] "end" "start"
## [13] "mean_cds_len" "version"
## [15] "transcript_version" "hgnc_symbol"
## [17] "uniprot_gn_symbol" "transcript"
## [19] "ensembl_transcript_id"
wanted_columns <- c("gid", "ensembl_transcript_id", "version", "transcript_version", "annot_gene_product",
"gene_type", "annot_cds_length", "annot_chromosome", "annot_strand", "start",
"end", "hgnc_symbol", "uniprot_gn_symbol", "transcript", "mean_cds_len")
all_lp_annot <- all_lp_annot[, wanted_columns]
colnames(all_lp_annot) <- colnames(hs_tx_annot)
hslp_annot <- rbind(hs_tx_annot, all_lp_annot)Finally, dump the current gene ontology data from the EuPathDB-generated orgdb instance. This may be used by goseq/topgo/gostats/etc…
Just as a lark, grab orthologs across the EuPathDB as well in the following block.
One might reasonably ask why in the heck the gene ID column was changed to ‘ensembl_gene_id’, this is to make it easier (possible) to merge these annotations with the ensembl human annotations, if one chooses to combine the two genomes and examine them together.
lp_go <- load_orgdb_go(package_name)## The chosen keytype was not available. Using 'GID'.
## This is an orgdb, good.
lp_go <- lp_go[, c("GID", "GO")]
lp_lengths <- all_lp_annot[, c("ensembl_gene_id", "mean_cds_len")]
colnames(lp_lengths) <- c("gid", "length")
all_lp_annot[["description"]] <- tolower(all_lp_annot[["description"]])
## extract_eupath_orthologs is from EuPathDB and isn't really
## being used for anything in this document afaik.
##orthos <- extract_eupath_orthologs(db = panamensis_pkg)
data_structures <- c(data_structures, "lp_lengths", "lp_go", "all_lp_annot", "meta")One should pay careful attention to the columns used to concatenate the gene/version and transcript/version. If these do not match exactly, many genes will be dropped by tximport without any output to signify something is wrong.
Note to self, I made a fun function which grabs the ids from salmon results. In addition I found that tximport can optionally ignore them.
## The tx_gene_map is used for tximport and salmon quantifications.
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".",
hs_annot[["transcript_version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique = TRUE)
hs_tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]The set of GO categories has not been limited to the 2020 data at the time of this writing. The GO categories is collected along with lengths for goseq. The other methods either have built-in databases of human data (gProfiler) or support orgDB data (org.Hs.eg.db) (clusterProfiler/topGO/gostats).
hs_go <- try(load_biomart_go()[["go"]])## Using mart: ENSEMBL_MART_ENSEMBL from host: useast.ensembl.org.
## Successfully connected to the hsapiens_gene_ensembl database.
## Finished downloading ensembl go annotations, saving to hsapiens_go_annotations.rda.
## Saving ontologies to hsapiens_go_annotations.rda.
## Finished save().
if ("try-error" %in% class(hs_go)) {
hs_go <- NULL
}
hs_length <- hs_annot[, c("ensembl_gene_id", "cds_length")]
colnames(hs_length) <- c("ID", "length")
data_structures <- c(data_structures, "hs_length", "hs_go", "hs_annot")Before we do any of the following subsets/analyses of the data, we need to collect it all in one place. Let’s do that here and name it ‘hs_expt’, it will comprise the full set of human data. When we cull some samples it will be renamed to ‘hs_valid’.
The variable ‘data_structures’ will contain the names of every variable I create in this document which I wish to save to disk.
Similarly, the character vector ‘sanitize_columns’ is the list of metadata columns over which I will iterate and sanitize the data, removing any spurious spaces, capitalization, etc.
The primary function used is ‘create_expt()’, which is responsible for combining the metadata (from the first block of code), the gene annotations (from the 2nd/3rd), and count tables (included in the metadata as a column of filenames) into a single expressionset (Huber et al. (2015)). The sanitize_columns variable defines the metadata columns which will be explicitly sanitized beyond the general checks performed. ‘subset_genes()’ is responsible for including only the protein coding genes in the data; it records the amount of information lost to the metadata as column ‘ncrna_lost’. ‘sanitize_metadata()’ will perform some sanity checks of the metadata to ensure that excel did not do anything untoward to the metadata. The ’set_expt_*()’ functions modify the metadata to make it more friendly for future tasks, including setting some columns to factors, defining the column of interest vs. known surrogate(s). Finally, ‘subset_expt()’ is used to drop samples which are known to not be of interest, like the few samples from people who presented as healthy in the clinic (when the clinicalpresentation column of the metadata is ‘h’).
One note if anyone actually reads this, I have a sister function which creates summarizedExperiments (“SummarizedExperiment” (n.d.)) instead of expressionSets, and I think all of my code should move to that; but it just has not happened because I just haven’t had a chance to finish writing the tests and making whatever little changes would be required to get them to all pass. If my fictitious reader is interested, I would love help completing the switch from exprs->se.
sanitize_columns <- c("visitnumber", "finaloutcome", "donor",
"typeofcells", "clinicalpresentation", "drug",
"condition", "batch", "clinic")
hs_expt <- create_expt(samplesheet,
file_column = "hg38100hisatfile",
savefile = glue("rda/tmrc3_hs_expt_all_raw-v{ver}.rda"),
gene_info = hs_annot) %>%
subset_genes(column = "gene_biotype", method = "keep",
pattern = "protein_coding", meta_column = "ncrna_lost") %>%
sanitize_expt_pData(columns = sanitize_columns) %>%
set_expt_factors(columns = sanitize_columns, class = "factor") %>%
set_expt_conditions(fact = "finaloutcome") %>%
set_expt_batches(fact = "visitnumber") %>%
subset_expt(subset = "typeofcells!='pbmcs'") %>%
subset_expt(subset = "typeofcells!='macrophages'") %>%
subset_expt(subset = "clinicalpresentation!='h'")## Reading the sample metadata.
## Error in eval(expr, envir, enclos) :
## basic_string::substr: __pos (which is 8) > this->size() (which is 5)
## Did not find the column: sampleid.
## Setting the ID column to the first column.
## Dropped 69 rows from the sample metadata because the sample ID is blank.
## Warning in extract_metadata(metadata, id_column = id_column, ...): There were
## NA values in the condition column, setting them to 'undefined'.
## Warning in extract_metadata(metadata, id_column = id_column, ...): There were
## NA values in the condition column, setting them to 'undefined'.
## The sample definitions comprises: 254 rows(samples) and 88 columns(metadata fields).
## Warning in create_expt(samplesheet, file_column = "hg38100hisatfile", savefile
## = glue("rda/tmrc3_hs_expt_all_raw-v{ver}.rda"), : Some samples were removed
## when cross referencing the samples against the count data.
## Matched 21476 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 21481 features and 246 samples.
## remove_genes_expt(), before removal, there were 21481 genes, now there are 19952.
## There are 8 samples which kept less than 90 percent counts.
## TMRC30015 TMRC30269 TMRC30017 TMRC30019 TMRC30044 TMRC30045 TMRC30154 TMRC30241
## 79.21 89.19 85.69 89.70 80.30 73.32 83.18 89.42
## The numbers of samples by condition are:
##
## cure failure lost notapplicable
## 130 64 16 36
## The number of samples by batch are:
##
## 1 2 3 notapplicable
## 108 56 54 28
## The samples excluded are: TMRC30001, TMRC30002, TMRC30003, TMRC30004, TMRC30005, TMRC30006.
## subset_expt(): There were 246, now there are 240 samples.
## The samples excluded are: TMRC30059, TMRC30060, TMRC30061, TMRC30062, TMRC30063, TMRC30051, TMRC30064, TMRC30065, TMRC30162, TMRC30066, TMRC30067, TMRC30117, TMRC30057, TMRC30069, TMRC30266, TMRC30268, TMRC30286, TMRC30249, TMRC30267, TMRC30252, TMRC30250, TMRC30251, TMRC30245, TMRC30246, TMRC30247, TMRC30248, TMRC30243, TMRC30244.
## subset_expt(): There were 240, now there are 212 samples.
## The samples excluded are: TMRC30007, TMRC30008.
## subset_expt(): There were 212, now there are 210 samples.
data_structures <- c(data_structures, "hs_expt")
## The following should make visit 1 the largest if one uses that column as a size factor when plotting.
meta <- pData(hs_expt) %>%
mutate(visitnumber = fct_relevel(visitnumber, c("3", "2", "1")))
start_order <- rownames(meta)
## Incorporate the identified strains
identified <- openxlsx::read.xlsx(species_identities)
no_data_idx <- identified[["Species"]] == "No hay dato"
identified <- identified[!no_data_idx, ]
identified <- identified[, c("Patient", "Species")]
colnames(identified) <- c("Patient", "ParasiteSpecies")
meta <- merge(meta, identified, by.x = "tubelabelorigin", by.y = "Patient", all.x = TRUE)
meta <- sanitize_metadata(meta, columns = "ParasiteSpecies", spaces = TRUE)
## Note that the merge removes the rownames and maybe changes sample order.
rownames(meta) <- meta[["tmrcidentifier"]]
meta <- meta[start_order, ]
pData(hs_expt) <- meta
save(list = "hs_expt", file = glue("rda/tmrc3_hs_expt_all_sanitized-v{ver}.rda"))There are some parameters provided by the clinicians which may prove to be of interest. I therefore am adding the demographics here. I will also run some simple checks to try to make sure that the clinician’s metadata agrees with the sample metadata.
In this first block, read the demographics and sanitize out the encoded NAs (999 and 888 or whatever) and add a column to join it to the sample sheet data.
hs_demographics <- openxlsx::read.xlsx(demographics)
hs_demographics[["join"]] <- tolower(hs_demographics[["Patient_ID"]])
columns_with_encoded_na <- c("Adherence_Glucantime", "Adherence_Miltefosine",
"Total_Ampules_Prescribed_Milt", "Prescribed_Daily_Dose_Milt",
"Total_Ampules_Prescribed_Gluc", "Prescribed_Daily_Dose_Gluc",
"V5_Total_Area_Ulcer", "V5_Vertical_Axis_Ulcer",
"V5_Horizontal_Axis_Ulcer", "V5_Total_Area_Lesion",
"V5_Vertical_Axis_Lesion", "V5_Horizontal_Axis_Lesion",
"V4_Total_Area_Ulcer", "V4_Vertical_Axis_Ulcer",
"V4_Horizontal_Axis_Ulcer", "V4_Total_Area_Lesion",
"V4_Vertical_Axis_Lesion", "V4_Horizontal_Axis_Lesion",
"V3_Total_Area_Ulcer", "V3_Vertical_Axis_Ulcer",
"V3_Horizontal_Axis_Ulcer", "V3_Total_Area_Lesion",
"V3_Vertical_Axis_Lesion", "V3_Horizontal_Axis_Lesion",
"V2_Total_Area_Ulcer", "V2_Vertical_Axis_Ulcer",
"V2_Horizontal_Axis_Ulcer", "V2_Total_Area_Lesion",
"V2_Vertical_Axis_Lesion", "V2_Horizontal_Axis_Lesion",
"V5_New_Lesions", "V4_New_Lesions", "V3_New_Lesions",
"V2_New_Lesions")
for (col in columns_with_encoded_na) {
na_idx <- hs_demographics[[col]] == 999
message("Column ", col, " contains ", sum(na_idx), " encoded NA values.")
hs_demographics[na_idx, col] <- NA
}## Column Adherence_Glucantime contains 5 encoded NA values.
## Column Adherence_Miltefosine contains 35 encoded NA values.
## Column Total_Ampules_Prescribed_Milt contains 35 encoded NA values.
## Column Prescribed_Daily_Dose_Milt contains 35 encoded NA values.
## Column Total_Ampules_Prescribed_Gluc contains 5 encoded NA values.
## Column Prescribed_Daily_Dose_Gluc contains 5 encoded NA values.
## Column V5_Total_Area_Ulcer contains 38 encoded NA values.
## Column V5_Vertical_Axis_Ulcer contains 38 encoded NA values.
## Column V5_Horizontal_Axis_Ulcer contains 38 encoded NA values.
## Column V5_Total_Area_Lesion contains 38 encoded NA values.
## Column V5_Vertical_Axis_Lesion contains 38 encoded NA values.
## Column V5_Horizontal_Axis_Lesion contains 38 encoded NA values.
## Column V4_Total_Area_Ulcer contains 35 encoded NA values.
## Column V4_Vertical_Axis_Ulcer contains 35 encoded NA values.
## Column V4_Horizontal_Axis_Ulcer contains 35 encoded NA values.
## Column V4_Total_Area_Lesion contains 35 encoded NA values.
## Column V4_Vertical_Axis_Lesion contains 35 encoded NA values.
## Column V4_Horizontal_Axis_Lesion contains 35 encoded NA values.
## Column V3_Total_Area_Ulcer contains 10 encoded NA values.
## Column V3_Vertical_Axis_Ulcer contains 10 encoded NA values.
## Column V3_Horizontal_Axis_Ulcer contains 10 encoded NA values.
## Column V3_Total_Area_Lesion contains 10 encoded NA values.
## Column V3_Vertical_Axis_Lesion contains 10 encoded NA values.
## Column V3_Horizontal_Axis_Lesion contains 10 encoded NA values.
## Column V2_Total_Area_Ulcer contains 4 encoded NA values.
## Column V2_Vertical_Axis_Ulcer contains 4 encoded NA values.
## Column V2_Horizontal_Axis_Ulcer contains 4 encoded NA values.
## Column V2_Total_Area_Lesion contains 4 encoded NA values.
## Column V2_Vertical_Axis_Lesion contains 4 encoded NA values.
## Column V2_Horizontal_Axis_Lesion contains 4 encoded NA values.
## Column V5_New_Lesions contains 21 encoded NA values.
## Column V4_New_Lesions contains 14 encoded NA values.
## Column V3_New_Lesions contains 4 encoded NA values.
## Column V2_New_Lesions contains 2 encoded NA values.
factor_columns <- c("Sex", "Ethnicity", "Infection_Location", "Previously_Diagnosed",
"Adverse_Effects_V3", "Adverse_Effects_V4", "Therapeutic_Outcome_V3",
"Therapeutic_Outcome_V4", "Therapeutic_Outcome_V5",
"Therapeutic_Outcome_Final")
for (col in factor_columns) {
hs_demographics[[col]] <- as.factor(hs_demographics[[col]])
}
## Extract the clinic from the patient id via a cheeseball regex.
hs_demographics[["Clinic"]] <- gsub(x = hs_demographics[["Patient_ID"]],
pattern = "^SU(\\d){1}.*$", replacement = "\\1")
data_structures <- c(data_structures, "hs_demographics")In the following we add a column ‘join’ analagous to that added to the demographics, this will be used to add the various columns across each person’s samples. I guess I could have done this with merge(all.x = TRUE, all.y = TRUE)
hs_pd <- pData(hs_expt)
hs_pd[["join"]] <- hs_pd[["tubelabelorigin"]]
hs_pd_demographics <- plyr::join(hs_pd, hs_demographics, by = "join")
hs_pd_demographics[["join"]] <- NULL
rownames(hs_pd_demographics) <- rownames(hs_pd)
na_idx <- is.na(hs_pd_demographics)
sum(na_idx)## [1] 8651
hs_pd_demographics[na_idx] <- "undefined"
data_structures <- c(data_structures, "hs_pd_demographics")In addition we will create a series of composite variables from combinations of extant factors.
pData(hs_expt) <- hs_pd_demographics
## Now add in the ethnicity and sex
## Add a combination of the ethnicity and clinic
## While I am at it, do sex+clinic
table(pData(hs_expt)[["Sex"]])##
## 1 2
## 173 37
table(pData(hs_expt)[["Ethnicity"]])##
## 1 2 3
## 101 62 47
table(pData(hs_expt)[["clinic"]])##
## cali tumaco
## 67 143
etnia_names <- as.character(pData(hs_expt)[["Ethnicity"]])
etnia_idx <- etnia_names == 1
etnia_names[etnia_idx] <- "afrocol"
etnia_idx <- etnia_names == 2
etnia_names[etnia_idx] <- "indigena"
etnia_idx <- etnia_names == 3
etnia_names[etnia_idx] <- "mestiza"
pData(hs_expt)[["etnia"]] <- as.factor(etnia_names)
table(pData(hs_expt)[["etnia"]])##
## afrocol indigena mestiza
## 101 62 47
table(pData(hs_expt)[["Ethnicity"]])##
## 1 2 3
## 101 62 47
etnia_factor <- paste0(pData(hs_expt)[["clinic"]], "_",
etnia_names)
pData(hs_expt)[["clinic_etnia"]] <- as.factor(etnia_factor)
table(pData(hs_expt)[["clinic_etnia"]])##
## cali_afrocol cali_indigena cali_mestiza tumaco_afrocol tumaco_indigena
## 15 33 19 86 29
## tumaco_mestiza
## 28
pData(hs_expt)[["sex"]] <- "female"
male_idx <- pData(hs_expt)[["Sex"]] == 1
pData(hs_expt)[male_idx, "sex"] <- "male"
pData(hs_expt)[["sex"]] <- as.factor(pData(hs_expt)[["sex"]])
sex_factor <- paste0(pData(hs_expt)[["clinic"]], "_",
pData(hs_expt)[["sex"]])
pData(hs_expt)[["clinic_sex"]] <- as.factor(sex_factor)
table(pData(hs_expt)[["clinic_sex"]])##
## cali_female cali_male tumaco_female tumaco_male
## 12 55 25 118
pData(hs_expt)[["etnia_sex"]] <- paste0(pData(hs_expt)[["etnia"]], "_",
pData(hs_expt)[["sex"]])
pData(hs_expt)[["etnia_sex"]] <- as.factor(pData(hs_expt)[["etnia_sex"]])
table(pData(hs_expt)[["etnia_sex"]])##
## afrocol_female afrocol_male indigena_female indigena_male mestiza_female
## 22 79 6 56 9
## mestiza_male
## 38
## Set a factor of samples which are visit 1 vs. 2/3.
v1_samples <- pData(hs_expt)[["visitnumber"]] == "1"
pData(hs_expt)[v1_samples, "visitbipart"] <- "v1"
pData(hs_expt)[!v1_samples, "visitbipart"] <- "vother"
table(pData(hs_expt)[["visitbipart"]])##
## v1 vother
## 100 110
pData(hs_expt)[["cell_visit_cf"]] <- paste0(pData(hs_expt)[["typeofcells"]], "_",
pData(hs_expt)[["visitnumber"]], "_",
pData(hs_expt)[["finaloutcome"]])
age_vector <- pData(hs_expt)[["Age"]]
pData(hs_expt)[["age_cat"]] <- make_quartile_factor(age_vector)
save(list = "hs_expt", file = glue("rda/tmrc3_hs_expt_all_sanitized_demographics-v{ver}.rda"))Check to make sure the demographics agrees with the metadata. I really want this block to return NULL upon completion.
two_columns <- pData(hs_expt)[, c("finaloutcome", "Therapeutic_Outcome_Final")]
undef_idx <- two_columns[[2]] == "undefined"
two_columns <- two_columns[!undef_idx, ]
two_columns[["rewritten"]] <- "undef"
cure_idx <- two_columns[[2]] == 0
two_columns[cure_idx, "rewritten"] <- "cure"
fail_idx <- two_columns[[2]] == 1
two_columns[fail_idx, "rewritten"] <- "failure"
lost_idx <- two_columns[[2]] == 2
two_columns[lost_idx, "rewritten"] <- "lost"
same_idx <- two_columns[["finaloutcome"]] == two_columns[["rewritten"]]
broken <- two_columns[!same_idx, ]
## I very much want the following line to evaluate to 0 rows.
broken## [1] finaloutcome Therapeutic_Outcome_Final
## [3] rewritten
## <0 rows> (or 0-length row.names)
There are some metadata factors for which I think it will be nice to see the numbers before and after our filters. The following shows how many samples we have of the primary types before filtering.
dim(pData(hs_expt))## [1] 210 163
table(pData(hs_expt)[["drug"]])##
## antimony miltefosine
## 202 8
table(pData(hs_expt)[["clinic"]])##
## cali tumaco
## 67 143
table(pData(hs_expt)[["finaloutcome"]])##
## cure failure lost
## 130 64 16
table(pData(hs_expt)[["typeofcells"]])##
## biopsy eosinophils monocytes neutrophils
## 21 45 73 71
table(pData(hs_expt)[["visitnumber"]])##
## 3 2 1
## 54 56 100
summary(as.numeric(pData(hs_expt)[["Evolution_Time"]]))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 6.00 8.25 12.00 21.00
summary(as.numeric(pData(hs_expt)[["Prescribed_Daily_Dose_Gluc"]]))## Warning in summary(as.numeric(pData(hs_expt)[["Prescribed_Daily_Dose_Gluc"]])):
## NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 13.0 14.0 18.0 17.2 20.0 20.0 8
summary(as.numeric(pData(hs_expt)[["V3_Vertical_Axis_Lesion"]]))## Warning in summary(as.numeric(pData(hs_expt)[["V3_Vertical_Axis_Lesion"]])):
## NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 7.0 18.1 23.2 32.7 90.0 53
summary(as.numeric(pData(hs_expt)[["V3_Total_Area_Lesion"]]))## Warning in summary(as.numeric(pData(hs_expt)[["V3_Total_Area_Lesion"]])): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 222 1761 2889 4315 16965 53
summary(as.numeric(pData(hs_expt)[["V3_Horizontal_Axis_Lesion"]]))## Warning in summary(as.numeric(pData(hs_expt)[["V3_Horizontal_Axis_Lesion"]])):
## NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 3.6 9.8 28.0 28.7 41.6 80.1 53
table(pData(hs_expt)[["sex"]])##
## female male
## 37 173
table(pData(hs_expt)[["etnia"]])##
## afrocol indigena mestiza
## 101 62 47
table(pData(hs_expt)[["Ethnicity"]])##
## 1 2 3
## 101 62 47
summary(as.numeric(pData(hs_expt)[["Age"]]))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.5 30.7 36.0 51.0
summary(pData(hs_expt)[["Weight"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 53.9 59.0 74.7 72.4 82.0 100.8
summary(pData(hs_expt)[["Height"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 145 160 166 167 174 183
table(pData(hs_expt)[["Therapeutic_Outcome_Final"]])##
## 0 1 2 3
## 130 64 16 0
table(pData(hs_expt)[["finaloutcome"]])##
## cure failure lost
## 130 64 16
unique(pData(hs_expt)[["Patient_ID"]])## [1] "SU1034" "SU1154" "SU1167" "SU1168" "SU1174" "SU1176" "SU1175" "SU2050"
## [9] "SU2052" "SU2065" "SU1195" "SU1196" "SU1197" "SU2066" "SU1198" "SU2068"
## [17] "SU2071" "SU2072" "SU2073" "SU2159" "SU2162" "SU2161" "SU2164" "SU2167"
## [25] "SU2168" "SU2172" "SU2173" "SU2183" "SU2184" "SU2195" "SU2188" "SU2190"
## [33] "SU2194" "SU2201"
length(unique(pData(hs_expt)[["Patient_ID"]]))## [1] 34
Get the sex of the unique patients.
unique_people <- unique(pData(hs_expt)[["Patient_ID"]])
unique_people## [1] "SU1034" "SU1154" "SU1167" "SU1168" "SU1174" "SU1176" "SU1175" "SU2050"
## [9] "SU2052" "SU2065" "SU1195" "SU1196" "SU1197" "SU2066" "SU1198" "SU2068"
## [17] "SU2071" "SU2072" "SU2073" "SU2159" "SU2162" "SU2161" "SU2164" "SU2167"
## [25] "SU2168" "SU2172" "SU2173" "SU2183" "SU2184" "SU2195" "SU2188" "SU2190"
## [33] "SU2194" "SU2201"
length(unique_people)## [1] 34
df <- pData(hs_expt)
people <- df[["Patient_ID"]]
first_indices <- order(people)[!duplicated(sort(people))]
table(df[first_indices, "sex"])##
## female male
## 7 27
table(df[first_indices, "finaloutcome"])##
## cure failure lost
## 22 10 2
There are lots of ways which we will categorize the data, here are some potential color choices for them.
color_choices <- list(
"cf" = list(
"cure" = "#998EC3",
"failure" = "#F1A340"),
"cflost" = list(
"cure" = "#998EC3",
"failure" = "#F1A340",
"lost" = "#343434"),
"type_visit" = list(
"monocytes_v1" = "#DD1C77",
"monocytes_v2" = "#C994C7",
"monocytes_v3" = "#E7E1EF",
"eosinophils_v1" = "#31A354",
"eosinophils_v2" = "#ADDD8E",
"eosinophils_v3" = "#F7FCD9",
"neutrophils_v1" = "#3182BD",
"neutrophils_v2" = "#9ECAE1",
"neutrophils_v3" = "#DEEBF7",
"biopsy_v1" = "#D95F0E"),
"type" = list(
"monocytes" = "#DD1C77",
"eosinophils" = "#31A354",
"neutrophils" = "#3182BD",
"biopsy" = "#D95F0E"),
"two_visit" = list(
"first" = "#33EE33",
"later" = "#023302"),
"visit" = list(
"v1" = "#CCCCCC",
"v2" = "#666666",
"v3" = "#111111"),
"visit2" = list(
"1" = "#CCCCCC",
"2" = "#666666",
"3" = "#111111"),
"visitbi" = list(
"v1" = "#BB0000",
"vother" = "#0000BB"),
"labs" = list(
"Brazil" = "#FFC300",
"Colombia" = "#525252"),
"clinic" = list(
"tumaco" = "#3182AA",
"cali" = "#C994AA"),
"clinic_cf" = list(
"tumaco_cure" = "#7670B3",
"tumaco_failure" = "#E7298A",
"cali_cure" = "#1B9E77",
"cali_failure" = "#D95F02"),
"ethnicity" = list(
"afrocol" = "#4293CE",
"indigena" = "#BDBDBD",
"mestiza" = "#FEB24C"),
"clinic_etnia" = list(
"cali_afrocol" = "#3182BD",
"cali_indigena" = "#636363",
"cali_mestiza" = "#F03B20",
"tumaco_afrocol" = "#9ECAE1",
"tumaco_indigena" = "#BDBDBD",
"tumaco_mestiza" = "#FEB24C"),
"sex" = list(
"female" = "#3182BD",
"male" = "#C994C7"),
"clinic_sex" = list(
"cali_female" = "#0000CC",
"tumaco_female" = "#3182BD",
"cali_male" = "#E7298A",
"tumaco_male" = "#C994C7"),
"cf_type" = list(
"cure_biopsy" = "#D95F0E",
"failure_biopsy" = "#FEC44F",
"cure_monocytes" = "#DD1C77",
"failure_monocytes" = "#C994C7",
"cure_eosinophils" = "#31A354",
"failure_eosinophils" = "#ADDD8E",
"cure_neutrophils" = "#3182BD",
"failure_neutrophils" = "#9ECAE1"),
"parasite" = list(
"notapplicable" = "#000000",
"lvbraziliensis" = "#AA0000",
"lvguyanensis" = "#660066",
"lvpanamensis" = "#3160AC"))
data_structures <- c(data_structures, "color_choices")Given the starting point above, we will start extracting groups of samples of interest.
The first set of samples removed from the data are those with too many missing genes.
all_nz <- plot_nonzero(hs_expt)## The following samples have less than 12968.8 genes.
## [1] "TMRC30010" "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30050" "TMRC30056"
## [7] "TMRC30052" "TMRC30058" "TMRC30031" "TMRC30038" "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
pp(file = "figures/S2_nonzero_all_samples.svg")
all_nz$plot## Warning: ggrepel: 192 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
dev.off()## png
## 2
all_nz## A non-zero genes plot of 210 samples.
## These samples have an average 45.18 CPM coverage and 14385 genes observed, ranging from 7647 to
## 16739.
## Warning: ggrepel: 194 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We should save a cpm and rpkm copy of the raw data for every subset in order to make it easier for folks to work with WGCNA and other tools (STRING).
dir.create("cpm/3_Cali_and_Tumaco", recursive = TRUE)
dir.create("rpkm")## Warning in dir.create("rpkm"): 'rpkm' already exists
cpm_data <- normalize_expt(hs_expt, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/all_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(hs_expt, filter = TRUE, convert = "rpkm",
column = "mean_cds_len", na_to_zero = TRUE)## Removing 5168 low-count genes (14784 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/hs_expt_rpkm-v{ver}.csv"))To my eyes, there are 3 or 4 samples which are likely candidates for removal. In addition, we will remove samples which were lost during the treatment and/or ones which were used in other experiments but included in the TMRC3 sample sheet (thus the ‘notapplicable’ or ‘null’).
I think this is stated elsewhere: variables prefixed with ‘tc’ are Tumaco and Cali; ‘t’ or ‘c’ correspond.
tc_nolost <- subset_expt(hs_expt, subset = "finaloutcome!='lost'")## The samples excluded are: TMRC30009, TMRC30010, TMRC30015, TMRC30011, TMRC30012, TMRC30013, TMRC30025, TMRC30023, TMRC30033, TMRC30038, TMRC30036, TMRC30024, TMRC30034, TMRC30039, TMRC30040, TMRC30035.
## subset_expt(): There were 210, now there are 194 samples.
tc_nomilt <- subset_expt(tc_nolost, subset = "drug=='antimony'")## The samples excluded are: TMRC30188, TMRC30189, TMRC30190, TMRC30242, TMRC30260, TMRC30262, TMRC30261, TMRC30263.
## subset_expt(): There were 194, now there are 186 samples.
plot_legend(tc_nomilt)## The colors used in the expressionset are: #1B9E77, #D95F02.
nomilt_nonzero <- plot_nonzero(tc_nomilt, plot_labels = TRUE)## The following samples have less than 12968.8 genes.
## [1] "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30050" "TMRC30056" "TMRC30052"
## [7] "TMRC30058" "TMRC30031" "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
nomilt_nonzero## A non-zero genes plot of 186 samples.
## These samples have an average 48.28 CPM coverage and 14418 genes observed, ranging from 9934 to
## 16739.
## Warning: ggrepel: 166 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
pp(file = "figures/figs2_nonzero.svg")
nomilt_nonzero## A non-zero genes plot of 186 samples.
## These samples have an average 48.28 CPM coverage and 14418 genes observed, ranging from 9934 to
## 16739.
## Warning: ggrepel: 166 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
dev.off()## png
## 2
tc_valid <- subset_expt(tc_nomilt, nonzero = 11000) %>%
set_expt_colors(colors = color_choices[["cf"]])## The samples (and read coverage) removed when filtering 11000 non-zero genes are:
## TMRC30050 TMRC30052
## 807589 3086380
## by number of genes.
## subset_expt(): There were 186, now there are 184 samples.
save(list = "tc_valid", file = glue("rda/tmrc3_tc_valid-v{ver}.rda"))
data_structures <- c(data_structures, "tc_valid")clinic_type_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "typeofcells", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)## Warning: attributes are not identical across measure variables; they will be
## dropped
clinic_type_outcome_sankey## A sankey plot describing the metadata of 184 samples,
## including 24 out of 0 nodes and traversing metadata factors:
## .
clinic_ethnicity_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "etnia", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)## Warning: attributes are not identical across measure variables; they will be
## dropped
clinic_ethnicity_outcome_sankey## A sankey plot describing the metadata of 184 samples,
## including 17 out of 0 nodes and traversing metadata factors:
## .
clinic_sex_outcome_sankey <- plot_meta_sankey(
tc_valid, factors = c("clinic", "sex", "finaloutcome"),
drill_down = TRUE, color_choices = color_choices)## Warning: attributes are not identical across measure variables; they will be
## dropped
clinic_sex_outcome_sankey## A sankey plot describing the metadata of 184 samples,
## including 13 out of 0 nodes and traversing metadata factors:
## .
The following plot is essentially identical to the previous with two exceptions:
nz_post <- plot_nonzero(tc_valid)## The following samples have less than 12968.8 genes.
## [1] "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30056" "TMRC30058" "TMRC30031"
## [7] "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
nz_post## A non-zero genes plot of 184 samples.
## These samples have an average 48.79 CPM coverage and 14466 genes observed, ranging from 11448 to
## 16739.
## Warning: ggrepel: 164 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
pp(file = "figures/figS2v2_post.svg")
nz_post## A non-zero genes plot of 184 samples.
## These samples have an average 48.79 CPM coverage and 14466 genes observed, ranging from 11448 to
## 16739.
## Warning: ggrepel: 155 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
dev.off()## png
## 2
We need to keep track of how many of each sample type is lost when we do our various filters. Thus I am repeating the same set of tallies. This will likely happen one more time, following the removal of samples which came from Cali.
table(pData(tc_valid)[["drug"]])##
## antimony
## 184
table(pData(tc_valid)[["clinic"]])##
## cali tumaco
## 61 123
table(pData(tc_valid)[["finaloutcome"]])##
## cure failure
## 122 62
table(pData(tc_valid)[["typeofcells"]])##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
table(pData(tc_valid)[["visit"]])## < table of extent 0 >
summary(as.numeric(pData(tc_valid)[["Evolution_Time"]]))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 6.00 8.19 12.00 21.00
summary(as.numeric(pData(tc_valid)[["Prescribed_Daily_Dose_Gluc"]]))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 14.8 19.0 17.5 20.0 20.0
summary(as.numeric(pData(tc_valid)[["V3_Vertical_Axis_Lesion"]]))## Warning in summary(as.numeric(pData(tc_valid)[["V3_Vertical_Axis_Lesion"]])):
## NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 4.1 11.7 22.0 32.7 90.0 53
summary(as.numeric(pData(tc_valid)[["V3_Total_Area_Lesion"]]))## Warning in summary(as.numeric(pData(tc_valid)[["V3_Total_Area_Lesion"]])): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 56 507 2866 4315 16965 53
summary(as.numeric(pData(tc_valid)[["V3_Horizontal_Axis_Ulcer"]]))## Warning in summary(as.numeric(pData(tc_valid)[["V3_Horizontal_Axis_Ulcer"]])):
## NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 0.0 0.0 11.4 16.9 49.7 53
table(pData(tc_valid)[["sex"]])##
## female male
## 28 156
table(pData(tc_valid)[["etnia"]])##
## afrocol indigena mestiza
## 91 46 47
summary(as.numeric(pData(tc_valid)[["Age"]]))## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.5 30.7 36.0 51.0
summary(pData(tc_valid)[["Weight"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 53.9 61.4 77.0 73.8 83.3 100.8
summary(pData(tc_valid)[["Height"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 152 162 166 167 174 183
length(unique(pData(tc_valid)[["Patient_ID"]]))## [1] 29
When we perform the logistic/linear models examining variables in the demographics, we need to ensure that we are working with only the people who are included in the final dataset.
people_remaining <- unique(pData(tc_valid)[["Patient_ID"]])
demographics_filtered_idx <- hs_demographics[["Patient_ID"]] %in% people_remaining
demographics_filtered <- hs_demographics[demographics_filtered_idx, ]
data_structures <- c(data_structures, "demographics_filtered")We now have two datasets which are comprised of only the cure/fail samples. It is worth considering if there are ways we can leverage this separation.
tc_cure <- subset_expt(tc_valid, subset = "finaloutcome=='cure'")## The samples excluded are: TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30017, TMRC30019, TMRC30071, TMRC30056, TMRC30105, TMRC30058, TMRC30094, TMRC30119, TMRC30122, TMRC30107, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30121, TMRC30026, TMRC30048, TMRC30054, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30053, TMRC30068, TMRC30123, TMRC30072, TMRC30078, TMRC30116, TMRC30076, TMRC30088, TMRC30197, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30177, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 184, now there are 122 samples.
data_structures <- c(data_structures, "tc_cure")
table(pData(tc_cure)[["visitnumber"]])##
## 3 2 1
## 32 33 57
tc_fail <- subset_expt(tc_valid, subset = "finaloutcome=='failure'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30138, TMRC30176, TMRC30153, TMRC30151, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30016, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30018, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30285, TMRC30020, TMRC30113, TMRC30164, TMRC30080, TMRC30082, TMRC30103, TMRC30022, TMRC30169, TMRC30093, TMRC30029, TMRC30170, TMRC30032, TMRC30028, TMRC30180, TMRC30014, TMRC30196, TMRC30030, TMRC30021, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30045, TMRC30191, TMRC30041, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30181, TMRC30133, TMRC30043, TMRC30184, TMRC30159, TMRC30129, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30161, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30168, TMRC30146, TMRC30182, TMRC30152, TMRC30155, TMRC30154, TMRC30136, TMRC30135, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 184, now there are 62 samples.
data_structures <- c(data_structures, "tc_fail")
table(pData(tc_fail)[["visitnumber"]])##
## 3 2 1
## 19 17 26
all_types <- table(pData(tc_valid)[["typeofcells"]])
all_types##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
all_times <- table(pData(tc_valid)[["visitnumber"]])
all_times##
## 3 2 1
## 51 50 83
We completed the initial filter of the data. Write it out. This data should include all samples except those we explicitly removed due to poor coverage, lost-status, or non-canonical cell type (the pilot study).
cpm_data <- normalize_expt(tc_valid, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/all_valid_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tc_valid, filter = TRUE, convert = "rpkm",
column = "mean_cds_len", na_to_zero = TRUE)## Removing 5654 low-count genes (14298 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tc_all_valid_samples-v{ver}.csv"))Extract the samples of each cell type, these will be the basis for a significant number of the clinical outcome comparisons.
## Note, I will save these data structures in a little bit when I
## decide how I want to set the conditions, batches, and colors.
tc_eosinophils <- subset_expt(tc_valid, subset = "typeofcells=='eosinophils'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30176, TMRC30153, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30016, TMRC30228, TMRC30229, TMRC30017, TMRC30231, TMRC30232, TMRC30018, TMRC30209, TMRC30210, TMRC30212, TMRC30213, TMRC30214, TMRC30215, TMRC30273, TMRC30275, TMRC30274, TMRC30276, TMRC30255, TMRC30256, TMRC30239, TMRC30240, TMRC30279, TMRC30280, TMRC30019, TMRC30258, TMRC30283, TMRC30284, TMRC30285, TMRC30020, TMRC30056, TMRC30105, TMRC30058, TMRC30080, TMRC30094, TMRC30082, TMRC30103, TMRC30022, TMRC30169, TMRC30093, TMRC30107, TMRC30170, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30014, TMRC30121, TMRC30030, TMRC30021, TMRC30026, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30045, TMRC30046, TMRC30049, TMRC30055, TMRC30047, TMRC30191, TMRC30053, TMRC30041, TMRC30068, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30123, TMRC30181, TMRC30072, TMRC30133, TMRC30078, TMRC30116, TMRC30184, TMRC30076, TMRC30129, TMRC30088, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30197, TMRC30146, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30217, TMRC30208, TMRC30219, TMRC30218, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 184, now there are 41 samples.
tc_monocytes <- subset_expt(tc_valid, subset = "typeofcells=='monocytes'")## The samples excluded are: TMRC30156, TMRC30186, TMRC30179, TMRC30222, TMRC30224, TMRC30269, TMRC30149, TMRC30253, TMRC30140, TMRC30138, TMRC30153, TMRC30151, TMRC30235, TMRC30270, TMRC30226, TMRC30227, TMRC30016, TMRC30229, TMRC30230, TMRC30017, TMRC30232, TMRC30233, TMRC30018, TMRC30210, TMRC30211, TMRC30213, TMRC30216, TMRC30215, TMRC30271, TMRC30275, TMRC30272, TMRC30276, TMRC30254, TMRC30256, TMRC30277, TMRC30240, TMRC30278, TMRC30280, TMRC30257, TMRC30019, TMRC30281, TMRC30284, TMRC30282, TMRC30285, TMRC30071, TMRC30020, TMRC30113, TMRC30058, TMRC30164, TMRC30094, TMRC30119, TMRC30103, TMRC30122, TMRC30022, TMRC30093, TMRC30029, TMRC30170, TMRC30032, TMRC30083, TMRC30028, TMRC30118, TMRC30180, TMRC30121, TMRC30196, TMRC30021, TMRC30026, TMRC30031, TMRC30027, TMRC30044, TMRC30166, TMRC30195, TMRC30048, TMRC30054, TMRC30045, TMRC30070, TMRC30047, TMRC30053, TMRC30068, TMRC30192, TMRC30042, TMRC30158, TMRC30160, TMRC30167, TMRC30181, TMRC30133, TMRC30043, TMRC30116, TMRC30076, TMRC30159, TMRC30088, TMRC30134, TMRC30137, TMRC30161, TMRC30175, TMRC30143, TMRC30168, TMRC30146, TMRC30182, TMRC30198, TMRC30200, TMRC30202, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30206, TMRC30136, TMRC30238, TMRC30074, TMRC30208, TMRC30077, TMRC30218, TMRC30079, TMRC30220, TMRC30135, TMRC30173, TMRC30144, TMRC30147, TMRC30265.
## subset_expt(): There were 184, now there are 63 samples.
tc_neutrophils <- subset_expt(tc_valid, subset = "typeofcells=='neutrophils'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30178, TMRC30221, TMRC30223, TMRC30269, TMRC30148, TMRC30253, TMRC30150, TMRC30138, TMRC30176, TMRC30151, TMRC30234, TMRC30270, TMRC30225, TMRC30227, TMRC30016, TMRC30228, TMRC30230, TMRC30017, TMRC30231, TMRC30233, TMRC30018, TMRC30209, TMRC30211, TMRC30212, TMRC30216, TMRC30214, TMRC30271, TMRC30273, TMRC30272, TMRC30274, TMRC30254, TMRC30255, TMRC30277, TMRC30239, TMRC30278, TMRC30279, TMRC30257, TMRC30019, TMRC30258, TMRC30281, TMRC30283, TMRC30282, TMRC30071, TMRC30020, TMRC30056, TMRC30113, TMRC30105, TMRC30164, TMRC30080, TMRC30119, TMRC30082, TMRC30122, TMRC30022, TMRC30169, TMRC30029, TMRC30107, TMRC30032, TMRC30096, TMRC30028, TMRC30115, TMRC30180, TMRC30014, TMRC30196, TMRC30030, TMRC30026, TMRC30037, TMRC30165, TMRC30044, TMRC30194, TMRC30048, TMRC30054, TMRC30045, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30191, TMRC30041, TMRC30171, TMRC30139, TMRC30132, TMRC30157, TMRC30183, TMRC30123, TMRC30072, TMRC30043, TMRC30078, TMRC30184, TMRC30159, TMRC30129, TMRC30172, TMRC30174, TMRC30161, TMRC30142, TMRC30145, TMRC30168, TMRC30197, TMRC30182, TMRC30199, TMRC30201, TMRC30203, TMRC30205, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30136, TMRC30207, TMRC30074, TMRC30217, TMRC30077, TMRC30219, TMRC30079, TMRC30135, TMRC30173, TMRC30264, TMRC30144, TMRC30147.
## subset_expt(): There were 184, now there are 62 samples.
The following writes out rpkm/cpm values for each celltype from both clinics(tc prefix).
cpm_data <- normalize_expt(tc_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tc_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tc_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9085 low-count genes (10867 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tc_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tc_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tc_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tc_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8844 low-count genes (11108 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tc_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tc_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tc_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tc_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 10708 low-count genes (9244 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tc_neutrophils-v{ver}.csv"))The prefixes of the following datastructures state that these are from both clinics (tc) and a specific visit (v1/v2/v3). The rpkm/cpm values will be written immediately thereafter.
tcv1_samples <- subset_expt(tc_valid, subset = "visitnumber=='1'")## The samples excluded are: TMRC30178, TMRC30179, TMRC30223, TMRC30224, TMRC30150, TMRC30140, TMRC30176, TMRC30153, TMRC30151, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30278, TMRC30279, TMRC30280, TMRC30282, TMRC30285, TMRC30056, TMRC30113, TMRC30105, TMRC30058, TMRC30164, TMRC30094, TMRC30119, TMRC30082, TMRC30122, TMRC30169, TMRC30093, TMRC30170, TMRC30032, TMRC30096, TMRC30028, TMRC30115, TMRC30118, TMRC30121, TMRC30196, TMRC30030, TMRC30037, TMRC30031, TMRC30027, TMRC30194, TMRC30195, TMRC30054, TMRC30070, TMRC30049, TMRC30055, TMRC30053, TMRC30068, TMRC30171, TMRC30139, TMRC30158, TMRC30160, TMRC30157, TMRC30183, TMRC30181, TMRC30072, TMRC30133, TMRC30078, TMRC30076, TMRC30159, TMRC30129, TMRC30088, TMRC30172, TMRC30137, TMRC30161, TMRC30142, TMRC30145, TMRC30143, TMRC30146, TMRC30182, TMRC30199, TMRC30201, TMRC30200, TMRC30202, TMRC30205, TMRC30206, TMRC30136, TMRC30217, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 184, now there are 83 samples.
save(list = "tcv1_samples", file = glue("rda/tmrc3_tcv1_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_samples")
tcv2_samples <- subset_expt(tc_valid, subset = "visitnumber=='2'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30138, TMRC30176, TMRC30153, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30016, TMRC30017, TMRC30231, TMRC30232, TMRC30233, TMRC30018, TMRC30209, TMRC30210, TMRC30211, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30254, TMRC30255, TMRC30256, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30019, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30285, TMRC30071, TMRC30020, TMRC30105, TMRC30164, TMRC30080, TMRC30094, TMRC30103, TMRC30122, TMRC30022, TMRC30169, TMRC30029, TMRC30107, TMRC30170, TMRC30083, TMRC30028, TMRC30115, TMRC30180, TMRC30014, TMRC30121, TMRC30021, TMRC30026, TMRC30037, TMRC30165, TMRC30027, TMRC30044, TMRC30166, TMRC30048, TMRC30045, TMRC30046, TMRC30070, TMRC30055, TMRC30047, TMRC30191, TMRC30041, TMRC30068, TMRC30192, TMRC30139, TMRC30042, TMRC30132, TMRC30160, TMRC30183, TMRC30167, TMRC30123, TMRC30133, TMRC30043, TMRC30078, TMRC30116, TMRC30184, TMRC30088, TMRC30172, TMRC30134, TMRC30174, TMRC30161, TMRC30175, TMRC30145, TMRC30168, TMRC30197, TMRC30146, TMRC30198, TMRC30201, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30206, TMRC30136, TMRC30207, TMRC30238, TMRC30074, TMRC30208, TMRC30219, TMRC30079, TMRC30220, TMRC30135, TMRC30173, TMRC30264, TMRC30147, TMRC30265.
## subset_expt(): There were 184, now there are 50 samples.
save(list = "tcv2_samples", file = glue("rda/tmrc3_tcv2_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_samples")
tcv3_samples <- subset_expt(tc_valid, subset = "visitnumber=='3'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30138, TMRC30151, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30016, TMRC30228, TMRC30229, TMRC30230, TMRC30017, TMRC30018, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30277, TMRC30239, TMRC30240, TMRC30257, TMRC30019, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30071, TMRC30020, TMRC30056, TMRC30113, TMRC30058, TMRC30080, TMRC30119, TMRC30082, TMRC30103, TMRC30022, TMRC30093, TMRC30029, TMRC30107, TMRC30032, TMRC30096, TMRC30083, TMRC30118, TMRC30180, TMRC30014, TMRC30196, TMRC30030, TMRC30021, TMRC30026, TMRC30031, TMRC30165, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30048, TMRC30054, TMRC30045, TMRC30046, TMRC30049, TMRC30047, TMRC30191, TMRC30053, TMRC30041, TMRC30171, TMRC30192, TMRC30042, TMRC30158, TMRC30132, TMRC30157, TMRC30167, TMRC30123, TMRC30181, TMRC30072, TMRC30043, TMRC30116, TMRC30184, TMRC30076, TMRC30159, TMRC30129, TMRC30134, TMRC30174, TMRC30137, TMRC30142, TMRC30175, TMRC30143, TMRC30168, TMRC30197, TMRC30182, TMRC30199, TMRC30198, TMRC30200, TMRC30203, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30218, TMRC30135, TMRC30264, TMRC30144, TMRC30265.
## subset_expt(): There were 184, now there are 51 samples.
save(list = "tcv3_samples", file = glue("rda/tmrc3_tcv3_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_samples")cpm_data <- normalize_expt(tcv1_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tcv1_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tcv1_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 5793 low-count genes (14159 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tcv1_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tcv2_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tcv2_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tcv2_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8210 low-count genes (11742 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tcv2_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tcv3_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/3_Cali_and_Tumaco/tcv3_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tcv3_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8285 low-count genes (11667 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tcv3_samples-v{ver}.csv"))All data structures which start with the prefix ‘tc’ are Tumaco and Cali. Those with ‘t’ are only Tumaco, ‘c’ are only Cali.
tc_clinical <- tc_valid %>%
set_expt_conditions(fact = "finaloutcome", colors = color_choices[["cf"]]) %>%
set_expt_batches(fact = "typeofcells")## The numbers of samples by condition are:
##
## cure failure
## 122 62
## The number of samples by batch are:
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
save(list = "tc_clinical", file = glue("rda/tmrc3_tc_clinical-v{ver}.rda"))
data_structures <- c(data_structures, "tc_clinical")For some of the following data structures, we will be concatenating various metadata factors of interest, usually the final outcome and one other metadatum.
In the following block I am writing out the Monocytes in a couple different ways, once with the clinical outcome as the primary factor, and again using the concatenation of clinic/outcome.
clinic_cf <- paste0(pData(tc_monocytes)[["clinic"]], "_",
pData(tc_monocytes)[["finaloutcome"]])
table(clinic_cf)## clinic_cf
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
tc_monocytes <- set_expt_conditions(
tc_monocytes, fact = clinic_cf, colors = color_choices[["clinic_cf"]]) %>%
set_expt_batches(fact = "visitnumber")## The numbers of samples by condition are:
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 21 21
## The number of samples by batch are:
##
## 3 2 1
## 19 18 26
save(list = "tc_monocytes", file = glue("rda/tmrc3_tc_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tc_monocytes")
tcv1_monocytes <- subset_expt(tc_monocytes, subset = "visitnumber=='1'")## The samples excluded are: TMRC30178, TMRC30223, TMRC30150, TMRC30176, TMRC30228, TMRC30231, TMRC30212, TMRC30214, TMRC30274, TMRC30255, TMRC30279, TMRC30056, TMRC30105, TMRC30082, TMRC30169, TMRC30096, TMRC30115, TMRC30030, TMRC30037, TMRC30194, TMRC30049, TMRC30055, TMRC30171, TMRC30139, TMRC30157, TMRC30183, TMRC30072, TMRC30078, TMRC30129, TMRC30172, TMRC30142, TMRC30145, TMRC30199, TMRC30201, TMRC30205, TMRC30217, TMRC30219.
## subset_expt(): There were 63, now there are 26 samples.
save(list = "tcv1_monocytes", file = glue("rda/tmrc3_tcv1_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_monocytes")
tcv2_monocytes <- subset_expt(tc_monocytes, subset = "visitnumber=='2'")## The samples excluded are: TMRC30185, TMRC30178, TMRC30221, TMRC30148, TMRC30176, TMRC30234, TMRC30225, TMRC30231, TMRC30209, TMRC30214, TMRC30273, TMRC30255, TMRC30239, TMRC30279, TMRC30258, TMRC30283, TMRC30105, TMRC30080, TMRC30169, TMRC30107, TMRC30115, TMRC30014, TMRC30037, TMRC30165, TMRC30046, TMRC30055, TMRC30191, TMRC30041, TMRC30139, TMRC30132, TMRC30183, TMRC30123, TMRC30078, TMRC30184, TMRC30172, TMRC30174, TMRC30145, TMRC30197, TMRC30201, TMRC30203, TMRC30205, TMRC30237, TMRC30207, TMRC30219, TMRC30264.
## subset_expt(): There were 63, now there are 18 samples.
save(list = "tcv2_monocytes", file = glue("rda/tmrc3_tcv2_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_monocytes")
tcv3_monocytes <- subset_expt(tc_monocytes, subset = "visitnumber=='3'")## The samples excluded are: TMRC30185, TMRC30221, TMRC30223, TMRC30148, TMRC30150, TMRC30234, TMRC30225, TMRC30228, TMRC30209, TMRC30212, TMRC30273, TMRC30274, TMRC30239, TMRC30258, TMRC30283, TMRC30056, TMRC30080, TMRC30082, TMRC30107, TMRC30096, TMRC30014, TMRC30030, TMRC30165, TMRC30194, TMRC30046, TMRC30049, TMRC30191, TMRC30041, TMRC30171, TMRC30132, TMRC30157, TMRC30123, TMRC30072, TMRC30184, TMRC30129, TMRC30174, TMRC30142, TMRC30197, TMRC30199, TMRC30203, TMRC30237, TMRC30207, TMRC30217, TMRC30264.
## subset_expt(): There were 63, now there are 19 samples.
save(list = "tcv3_monocytes", file = glue("rda/tmrc3_tcv3_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_monocytes")Followed by the Biopsy samples…
tc_clinical_nobiop <- subset_expt(tc_clinical, subset = "typeofcells!='biopsy'") %>%
set_expt_colors(color_choices[["cf"]])## The samples excluded are: TMRC30156, TMRC30269, TMRC30253, TMRC30270, TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 184, now there are 166 samples.
save(list = "tc_clinical_nobiop", file = glue("rda/tmrc3_tc_clinical_nobiop-v{ver}.rda"))
data_structures <- c(data_structures, "tc_clinical_nobiop")
tc_biopsies <- tc_valid %>%
set_expt_conditions(fact = "clinic") %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset = "typeofcells=='biopsy'")## The numbers of samples by condition are:
##
## cali tumaco
## 61 123
## The number of samples by batch are:
##
## cure failure
## 122 62
## The samples excluded are: TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30148, TMRC30149, TMRC30150, TMRC30140, TMRC30138, TMRC30176, TMRC30153, TMRC30151, TMRC30234, TMRC30235, TMRC30225, TMRC30226, TMRC30227, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30285, TMRC30071, TMRC30056, TMRC30113, TMRC30105, TMRC30058, TMRC30164, TMRC30080, TMRC30094, TMRC30119, TMRC30082, TMRC30103, TMRC30122, TMRC30169, TMRC30093, TMRC30029, TMRC30107, TMRC30170, TMRC30032, TMRC30096, TMRC30083, TMRC30028, TMRC30115, TMRC30118, TMRC30180, TMRC30014, TMRC30121, TMRC30196, TMRC30030, TMRC30021, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30194, TMRC30166, TMRC30195, TMRC30048, TMRC30054, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30191, TMRC30053, TMRC30041, TMRC30068, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30123, TMRC30181, TMRC30072, TMRC30133, TMRC30043, TMRC30078, TMRC30116, TMRC30184, TMRC30076, TMRC30159, TMRC30129, TMRC30088, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30161, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30168, TMRC30197, TMRC30146, TMRC30182, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30237, TMRC30206, TMRC30136, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30135, TMRC30173, TMRC30264, TMRC30144, TMRC30147, TMRC30265.
## subset_expt(): There were 184, now there are 18 samples.
tc_cf <- paste0(pData(tc_biopsies)[["clinic"]], "_",
pData(tc_biopsies)[["finaloutcome"]])
table(tc_cf)## tc_cf
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
tc_biopsies <- set_expt_conditions(
tc_biopsies, fact = tc_cf, colors = color_choices[["clinic_cf"]]) %>%
set_expt_batches(fact = "visitnumber")## The numbers of samples by condition are:
##
## cali_cure tumaco_cure tumaco_failure
## 4 9 5
## Warning in set_expt_colors(new_expt, colors = colors): Colors for the following
## categories are not being used: cali_failure.
## The number of samples by batch are:
##
## 1
## 18
save(list = "tc_biopsies", file = glue("rda/tmrc3_tc_biopsies-v{ver}.rda"))
data_structures <- c(data_structures, "tc_biopsies")… and the Eosinophils. These are noteworthy because they have fewer fails than some other cohorts.
clinic_cf <- paste0(pData(tc_eosinophils)[["clinic"]], "_",
pData(tc_eosinophils)[["finaloutcome"]])
table(clinic_cf)## clinic_cf
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
tc_eosinophils <- set_expt_conditions(
tc_eosinophils, fact = clinic_cf, colors = color_choices[["clinic_cf"]]) %>%
set_expt_batches(fact = "visitnumber")## The numbers of samples by condition are:
##
## cali_cure tumaco_cure tumaco_failure
## 15 17 9
## Warning in set_expt_colors(new_expt, colors = colors): Colors for the following
## categories are not being used: cali_failure.
## The number of samples by batch are:
##
## 3 2 1
## 13 14 14
save(list = "tc_eosinophils", file = glue("rda/tmrc3_tc_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tc_eosinophils")
tcv1_eosinophils <- subset_expt(tc_eosinophils, subset = "visitnumber=='1'")## The samples excluded are: TMRC30151, TMRC30230, TMRC30233, TMRC30216, TMRC30272, TMRC30254, TMRC30277, TMRC30278, TMRC30282, TMRC30113, TMRC30164, TMRC30119, TMRC30122, TMRC30032, TMRC30028, TMRC30196, TMRC30054, TMRC30070, TMRC30159, TMRC30161, TMRC30182, TMRC30136, TMRC30077, TMRC30079, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 41, now there are 14 samples.
save(list = "tcv1_eosinophils", file = glue("rda/tmrc3_tcv1_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_eosinophils")
tcv2_eosinophils <- subset_expt(tc_eosinophils, subset = "visitnumber=='2'")## The samples excluded are: TMRC30138, TMRC30227, TMRC30233, TMRC30211, TMRC30216, TMRC30271, TMRC30254, TMRC30278, TMRC30257, TMRC30281, TMRC30071, TMRC30164, TMRC30122, TMRC30029, TMRC30028, TMRC30180, TMRC30048, TMRC30070, TMRC30043, TMRC30161, TMRC30168, TMRC30136, TMRC30074, TMRC30079, TMRC30135, TMRC30173, TMRC30147.
## subset_expt(): There were 41, now there are 14 samples.
save(list = "tcv2_eosinophils", file = glue("rda/tmrc3_tcv2_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_eosinophils")
tcv3_eosinophils <- subset_expt(tc_eosinophils, subset = "visitnumber=='3'")## The samples excluded are: TMRC30138, TMRC30151, TMRC30227, TMRC30230, TMRC30211, TMRC30271, TMRC30272, TMRC30277, TMRC30257, TMRC30281, TMRC30282, TMRC30071, TMRC30113, TMRC30119, TMRC30029, TMRC30032, TMRC30180, TMRC30196, TMRC30048, TMRC30054, TMRC30043, TMRC30159, TMRC30168, TMRC30182, TMRC30074, TMRC30077, TMRC30135, TMRC30144.
## subset_expt(): There were 41, now there are 13 samples.
save(list = "tcv3_eosinophils", file = glue("rda/tmrc3_tcv3_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_eosinophils")Followed by the Neutrophils…
clinic_cf <- paste0(pData(tc_neutrophils)[["clinic"]], "_",
pData(tc_neutrophils)[["finaloutcome"]])
table(clinic_cf)## clinic_cf
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
tc_neutrophils <- set_expt_conditions(
tc_neutrophils, fact = clinic_cf, colors = color_choices[["clinic_cf"]]) %>%
set_expt_batches(fact = "visitnumber")## The numbers of samples by condition are:
##
## cali_cure cali_failure tumaco_cure tumaco_failure
## 18 3 20 21
## The number of samples by batch are:
##
## 3 2 1
## 19 18 25
save(list = "tc_neutrophils", file = glue("rda/tmrc3_tc_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tc_neutrophils")
tcv1_neutrophils <- subset_expt(tc_neutrophils, subset = "visitnumber=='1'")## The samples excluded are: TMRC30179, TMRC30224, TMRC30140, TMRC30153, TMRC30229, TMRC30232, TMRC30213, TMRC30215, TMRC30276, TMRC30256, TMRC30280, TMRC30285, TMRC30058, TMRC30094, TMRC30093, TMRC30170, TMRC30118, TMRC30121, TMRC30031, TMRC30027, TMRC30195, TMRC30053, TMRC30068, TMRC30158, TMRC30160, TMRC30181, TMRC30133, TMRC30076, TMRC30088, TMRC30137, TMRC30143, TMRC30146, TMRC30200, TMRC30202, TMRC30206, TMRC30218, TMRC30220.
## subset_expt(): There were 62, now there are 25 samples.
save(list = "tcv1_neutrophils", file = glue("rda/tmrc3_tcv1_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_neutrophils")
tcv2_neutrophils <- subset_expt(tc_neutrophils, subset = "visitnumber=='2'")## The samples excluded are: TMRC30186, TMRC30179, TMRC30222, TMRC30149, TMRC30153, TMRC30235, TMRC30226, TMRC30232, TMRC30210, TMRC30215, TMRC30275, TMRC30256, TMRC30240, TMRC30280, TMRC30284, TMRC30285, TMRC30094, TMRC30103, TMRC30170, TMRC30083, TMRC30121, TMRC30021, TMRC30027, TMRC30166, TMRC30047, TMRC30068, TMRC30192, TMRC30042, TMRC30160, TMRC30167, TMRC30133, TMRC30116, TMRC30088, TMRC30134, TMRC30175, TMRC30146, TMRC30198, TMRC30202, TMRC30204, TMRC30206, TMRC30238, TMRC30208, TMRC30220, TMRC30265.
## subset_expt(): There were 62, now there are 18 samples.
save(list = "tcv2_neutrophils", file = glue("rda/tmrc3_tcv2_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_neutrophils")
tcv3_neutrophils <- subset_expt(tc_neutrophils, subset = "visitnumber=='3'")## The samples excluded are: TMRC30186, TMRC30222, TMRC30224, TMRC30149, TMRC30140, TMRC30235, TMRC30226, TMRC30229, TMRC30210, TMRC30213, TMRC30275, TMRC30276, TMRC30240, TMRC30284, TMRC30058, TMRC30103, TMRC30093, TMRC30083, TMRC30118, TMRC30021, TMRC30031, TMRC30166, TMRC30195, TMRC30047, TMRC30053, TMRC30192, TMRC30042, TMRC30158, TMRC30167, TMRC30181, TMRC30116, TMRC30076, TMRC30134, TMRC30137, TMRC30175, TMRC30143, TMRC30198, TMRC30200, TMRC30204, TMRC30238, TMRC30208, TMRC30218, TMRC30265.
## subset_expt(): There were 62, now there are 19 samples.
save(list = "tcv3_neutrophils", file = glue("rda/tmrc3_tcv3_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_neutrophils")tc_sex <- set_expt_conditions(tc_valid, fact = "sex", colors = color_choices[["sex"]])## The numbers of samples by condition are:
##
## female male
## 28 156
save(list = "tc_sex", file = glue("rda/tmrc3_tc_sex-v{ver}.rda"))
data_structures <- c(data_structures, "tc_sex")Our recent discussions have settled one big question regarding which samples to use: We will limit our analyses to only those samples from Tumaco.
The following block will therefore set that group as our default for future analyses. We will essentially repeat all of the above sample separations for the Tumaco-only cohort.
t_clinical <- tc_clinical %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30138, TMRC30176, TMRC30153, TMRC30151, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30285.
## subset_expt(): There were 184, now there are 123 samples.
save(list = "t_clinical", file = glue("rda/tmrc3_t_clinical-v{ver}.rda"))
data_structures <- c(data_structures, "t_clinical")
cpm_data <- normalize_expt(t_clinical, convert = "cpm")
t_clinical_cpm_variance <- variance_expt(cpm_data)
## Add variance by gene for all samples excluding biopsies (I assume also excluding Cali).
## This is intended to address a query from Maria Adelaida on 20230111written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_clinical-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_clinical, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 5796 low-count genes (14156 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_clinical-v{ver}.csv"))
cpm_data <- normalize_expt(t_clinical, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 5796 low-count genes (14156 remaining).
## Setting 17331 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_clinical_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_clinical, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 5796 low-count genes (14156 remaining).
## Setting 9184 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_clinical_sva-v{ver}.csv"))written_expt <- write_expt(
t_clinical,
excel = glue("excel/t_clinical-v{ver}.xlsx"))## Writing the first sheet, containing a legend and some summary data.
## The following samples have less than 12968.8 genes.
## [1] "TMRC30056" "TMRC30058" "TMRC30031" "TMRC30265"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## 646953 entries are 0. We are on a log scale, adding 1 to the data.
##
## Naively calculating coefficient of variation/dispersion with respect to condition.
##
## Finished calculating dispersion estimates.
##
## `geom_smooth()` using formula = 'y ~ x'
## Removing 5796 low-count genes (14156 remaining).
##
## transform_counts: Found 126745 values equal to 0, adding 1 to the matrix.
##
## `geom_smooth()` using formula = 'y ~ x'
dir.create("cpm/4_Tumaco", recursive = TRUE)## Warning in dir.create("cpm/4_Tumaco", recursive = TRUE): 'cpm/4_Tumaco' already
## exists
t_clinical_nobiop <- subset_expt(t_clinical, subset = "typeofcells!='biopsy'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 123, now there are 109 samples.
save(list = "t_clinical_nobiop", file = glue("rda/tmrc3_t_clinical_nobiop-v{ver}.rda"))
data_structures <- c(data_structures, "t_clinical_nobiop")
cpm_data <- normalize_expt(t_clinical_nobiop, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_clinical_nobiop-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_clinical, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 5796 low-count genes (14156 remaining).
## Setting 9184 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_clinical_sva-v{ver}.csv"))
cpm_data <- normalize_expt(t_clinical_nobiop, convert = "cpm",
filter = TRUE, batch = "svaseq")## Removing 8042 low-count genes (11910 remaining).
## Setting 9605 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_clinical_nobiop_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_clinical_nobiop, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8042 low-count genes (11910 remaining).
## Setting 1841 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_clinical_nobiop_sva-v{ver}.csv"))I added a few columns to the gene annotations reflecting the variance/stdev/mean expression of each gene. This really highlights the extraordinary degree to which genes are changing in the data…
top_expt <- normalize_expt(t_clinical_cpm_variance, filter = TRUE) %>%
variance_expt()## Removing 5476 low-count genes (14476 remaining).
top_fd <- fData(top_expt)
top_ex <- exprs(top_expt)
idx <- order(top_fd[["exprs_gene_stdev"]], decreasing = TRUE)
top_df <- top_fd[idx, ]
written <- write_xlsx(
data = head(top_fd, n = 100),
excel = glue("excel/t_clinical_cpm_stdev_top100-v{ver}.xlsx"))
bottom_expt <- normalize_expt(t_clinical_cpm_variance, filter = "simple", threshold = 500) %>%
variance_expt()## This function generally expects the threshold argument to be used with 'thresh'.
## Removing 10013 low-count genes (9939 remaining).
bottom_fd <- fData(bottom_expt)
bottom_ex <- exprs(bottom_expt)
idx <- order(bottom_fd[["exprs_gene_stdev"]])
bottom <- bottom_fd[idx, ]
written <- write_xlsx(
data = head(bottom_fd, n = 100),
excel = glue("excel/t_clinical_cpm_stdev_bottom100-v{ver}.xlsx"))At least in theory, everything which follows will be using the above ‘clinical’ data structure. Thus, let us count it up and get a sense of what we will work with.
table(pData(t_clinical)[["drug"]])##
## antimony
## 123
table(pData(t_clinical)[["clinic"]])##
## tumaco
## 123
table(pData(t_clinical)[["finaloutcome"]])##
## cure failure
## 67 56
table(pData(t_clinical)[["typeofcells"]])##
## biopsy eosinophils monocytes neutrophils
## 14 26 42 41
table(pData(t_clinical)[["visitnumber"]])##
## 3 2 1
## 34 35 54
summary(pData(t_clinical)[["Evolution_Time"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 4.00 7.03 8.00 21.00
summary(pData(t_clinical)[["Prescribed_Daily_Dose_Gluc"]])## Length Class Mode
## 123 character character
summary(pData(t_clinical)[["V3_Vertical_Axis_Lesion"]])## Length Class Mode
## 123 character character
summary(pData(t_clinical)[["V3_Total_Area_Lesion"]])## Length Class Mode
## 123 character character
summary(pData(t_clinical)[["V3_Horizontal_Axis_Ulcer"]])## Length Class Mode
## 123 character character
table(pData(t_clinical)[["Sex"]])##
## 1 2
## 101 22
table(pData(t_clinical)[["Ethnicity"]])##
## 1 2 3
## 76 19 28
summary(pData(t_clinical)[["Age"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 23.0 25.0 28.5 34.0 51.0
summary(pData(t_clinical)[["Weight"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 53.9 58.8 69.4 71.6 83.3 100.8
table(pData(t_clinical)[["Height"]])##
## 152 154 158 159 163 164 165 166 172 173 174 176 177 182 183
## 1 10 15 2 9 15 12 10 10 4 8 1 7 9 10
length(unique(pData(t_clinical)[["Patient_ID"]]))## [1] 19
only_cure <- pData(t_clinical)[["finaloutcome"]] == "cure"
c_meta <- pData(t_clinical)[only_cure, ]
length(unique(c_meta[["Patient_ID"]]))## [1] 10
only_fail <- pData(t_clinical)[["finaloutcome"]] == "failure"
f_meta <- pData(t_clinical)[only_fail, ]
length(unique(f_meta[["Patient_ID"]]))## [1] 9
t_cure <- subset_expt(t_clinical, subset = "finaloutcome=='cure'")## The samples excluded are: TMRC30017, TMRC30019, TMRC30071, TMRC30056, TMRC30105, TMRC30058, TMRC30094, TMRC30119, TMRC30122, TMRC30107, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30121, TMRC30026, TMRC30048, TMRC30054, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30053, TMRC30068, TMRC30123, TMRC30072, TMRC30078, TMRC30116, TMRC30076, TMRC30088, TMRC30197, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30177, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 123, now there are 67 samples.
table(pData(t_cure)$visitnumber)##
## 3 2 1
## 17 20 30
t_fail <- subset_expt(t_clinical, subset = "finaloutcome=='failure'")## The samples excluded are: TMRC30016, TMRC30018, TMRC30020, TMRC30113, TMRC30164, TMRC30080, TMRC30082, TMRC30103, TMRC30022, TMRC30169, TMRC30093, TMRC30029, TMRC30170, TMRC30032, TMRC30028, TMRC30180, TMRC30014, TMRC30196, TMRC30030, TMRC30021, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30045, TMRC30191, TMRC30041, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30181, TMRC30133, TMRC30043, TMRC30184, TMRC30159, TMRC30129, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30161, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30168, TMRC30146, TMRC30182, TMRC30152, TMRC30155, TMRC30154, TMRC30136, TMRC30135, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 123, now there are 56 samples.
table(pData(t_fail)$visitnumber)##
## 3 2 1
## 17 15 24
I previously made a bunch of data subsets by visit, cell type, etc. So let us overwrite them all with versions which contain only the Tumaco samples.
The is a copy of the cell type extractions above, except we lead off with the removal of the Cali samples.
t_biopsies <- tc_biopsies %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30156, TMRC30269, TMRC30253, TMRC30270.
## subset_expt(): There were 18, now there are 14 samples.
save(list = "t_biopsies", file = glue("rda/tmrc3_t_biopsies-v{ver}.rda"))
data_structures <- c(data_structures, "t_biopsies")
cpm_data <- normalize_expt(t_biopsies, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_biopsies-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_biopsies, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 6439 low-count genes (13513 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_biopsies-v{ver}.csv"))
cpm_data <- normalize_expt(t_biopsies, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 6439 low-count genes (13513 remaining).
## Setting 146 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_biopsies_sva-v{ver}.xlsx"))t_eosinophils <- tc_eosinophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30138, TMRC30151, TMRC30227, TMRC30230, TMRC30233, TMRC30211, TMRC30216, TMRC30271, TMRC30272, TMRC30254, TMRC30277, TMRC30278, TMRC30257, TMRC30281, TMRC30282.
## subset_expt(): There were 41, now there are 26 samples.
save(list = "t_eosinophils", file = glue("rda/tmrc3_t_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "t_eosinophils")
cpm_data <- normalize_expt(t_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_eosinophils-v{ver}.xlsx"))
cpm_data <- normalize_expt(t_eosinophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9420 low-count genes (10532 remaining).
## Setting 327 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_eosinophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_eosinophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9420 low-count genes (10532 remaining).
## Setting 106 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_eosinophils_sva-v{ver}.csv"))t_monocytes <- tc_monocytes %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30185, TMRC30178, TMRC30221, TMRC30223, TMRC30148, TMRC30150, TMRC30176, TMRC30234, TMRC30225, TMRC30228, TMRC30231, TMRC30209, TMRC30212, TMRC30214, TMRC30273, TMRC30274, TMRC30255, TMRC30239, TMRC30279, TMRC30258, TMRC30283.
## subset_expt(): There were 63, now there are 42 samples.
save(list = "t_monocytes", file = glue("rda/tmrc3_t_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "t_monocytes")
cpm_data <- normalize_expt(t_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9090 low-count genes (10862 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(t_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9090 low-count genes (10862 remaining).
## Setting 736 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9090 low-count genes (10862 remaining).
## Setting 295 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_monocytes_sva-v{ver}.csv"))t_neutrophils <- tc_neutrophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30186, TMRC30179, TMRC30222, TMRC30224, TMRC30149, TMRC30140, TMRC30153, TMRC30235, TMRC30226, TMRC30229, TMRC30232, TMRC30210, TMRC30213, TMRC30215, TMRC30275, TMRC30276, TMRC30256, TMRC30240, TMRC30280, TMRC30284, TMRC30285.
## subset_expt(): There were 62, now there are 41 samples.
save(list = "t_neutrophils", file = glue("rda/tmrc3_t_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "t_neutrophils")
cpm_data <- normalize_expt(t_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 10851 low-count genes (9101 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(t_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 10851 low-count genes (9101 remaining).
## Setting 754 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(t_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 10851 low-count genes (9101 remaining).
## Setting 214 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_neutrophils_sva-v{ver}.csv"))tv1_samples <- tcv1_samples %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30221, TMRC30222, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30138, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30209, TMRC30210, TMRC30211, TMRC30271, TMRC30273, TMRC30275, TMRC30239, TMRC30240, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284.
## subset_expt(): There were 83, now there are 54 samples.
save(list = "tv1_samples", file = glue("rda/tmrc3_tv1_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_samples")
cpm_data <- normalize_expt(tv1_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 5929 low-count genes (14023 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_samples, convert="cpm", filter=TRUE, batch="svaseq")## Removing 5929 low-count genes (14023 remaining).
## Setting 7655 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 5929 low-count genes (14023 remaining).
## Setting 2552 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_samples_sva-v{ver}.csv"))tv2_samples <- tcv2_samples %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30223, TMRC30224, TMRC30150, TMRC30140, TMRC30151, TMRC30228, TMRC30229, TMRC30230, TMRC30212, TMRC30213, TMRC30272, TMRC30274, TMRC30276, TMRC30277, TMRC30282.
## subset_expt(): There were 50, now there are 35 samples.
save(list = "tv2_samples", file = glue("rda/tmrc3_tv2_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_samples")
cpm_data <- normalize_expt(tv2_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8390 low-count genes (11562 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_samples, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8390 low-count genes (11562 remaining).
## Setting 2857 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8390 low-count genes (11562 remaining).
## Setting 415 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_samples_sva-v{ver}.csv"))tv3_samples <- tcv3_samples %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30178, TMRC30179, TMRC30176, TMRC30153, TMRC30231, TMRC30232, TMRC30233, TMRC30216, TMRC30214, TMRC30215, TMRC30254, TMRC30255, TMRC30256, TMRC30278, TMRC30279, TMRC30280, TMRC30285.
## subset_expt(): There were 51, now there are 34 samples.
save(list = "tv3_samples", file = glue("rda/tmrc3_tv3_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_samples")
cpm_data <- normalize_expt(tv3_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8500 low-count genes (11452 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_samples, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8500 low-count genes (11452 remaining).
## Setting 1887 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8500 low-count genes (11452 remaining).
## Setting 247 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_samples_sva-v{ver}.csv"))tv1_eosinophils <- tcv1_eosinophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30138, TMRC30227, TMRC30211, TMRC30271, TMRC30257, TMRC30281.
## subset_expt(): There were 14, now there are 8 samples.
save(list = "tv1_eosinophils", file = glue("rda/tmrc3_tv1_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_eosinophils")
cpm_data <- normalize_expt(tv1_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9973 low-count genes (9979 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_eosinophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9973 low-count genes (9979 remaining).
## Setting 57 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_eosinophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_eosinophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9973 low-count genes (9979 remaining).
## Setting 20 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_eosinophils_sva-v{ver}.csv"))tv2_eosinophils <- tcv2_eosinophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30151, TMRC30230, TMRC30272, TMRC30277, TMRC30282.
## subset_expt(): There were 14, now there are 9 samples.
save(list = "tv2_eosinophils", file = glue("rda/tmrc3_tv2_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_eosinophils")
cpm_data <- normalize_expt(tv2_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9835 low-count genes (10117 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_eosinophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9835 low-count genes (10117 remaining).
## Setting 90 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_eosinophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_eosinophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9835 low-count genes (10117 remaining).
## Setting 39 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_eosinophils_sva-v{ver}.csv"))tv3_eosinophils <- tcv3_eosinophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30233, TMRC30216, TMRC30254, TMRC30278.
## subset_expt(): There were 13, now there are 9 samples.
save(list = "tv3_eosinophils", file = glue("rda/tmrc3_tv3_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_eosinophils")
cpm_data <- normalize_expt(tv3_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9872 low-count genes (10080 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_eosinophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9872 low-count genes (10080 remaining).
## Setting 48 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_eosinophils_sva-v{ver}.xlsx"))
## Note, the cbcb filter left behind a sufficient number of zeros that it confused cpm.
rpkm_data <- normalize_expt(tv3_eosinophils, filter = "simple", batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 3353 low-count genes (16599 remaining).
## Setting 7019 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_eosinophils_sva-v{ver}.csv"))tv1_monocytes <- tcv1_monocytes %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30185, TMRC30221, TMRC30148, TMRC30234, TMRC30225, TMRC30209, TMRC30273, TMRC30239, TMRC30258, TMRC30283.
## subset_expt(): There were 26, now there are 16 samples.
save(list = "tv1_monocytes", file = glue("rda/tmrc3_tv1_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_monocytes")
cpm_data <- normalize_expt(tv1_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9470 low-count genes (10482 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_monocytes, convert="cpm", filter=TRUE, batch="svaseq")## Removing 9470 low-count genes (10482 remaining).
## Setting 190 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9470 low-count genes (10482 remaining).
## Setting 80 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_monocytes_sva-v{ver}.csv"))tv2_monocytes <- tcv2_monocytes %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30223, TMRC30150, TMRC30228, TMRC30212, TMRC30274.
## subset_expt(): There were 18, now there are 13 samples.
save(list = "tv2_monocytes", file = glue("rda/tmrc3_tv2_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_monocytes")
cpm_data <- normalize_expt(tv2_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9429 low-count genes (10523 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9429 low-count genes (10523 remaining).
## Setting 117 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9429 low-count genes (10523 remaining).
## Setting 39 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_monocytes_sva-v{ver}.csv"))tv3_monocytes <- tcv3_monocytes %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30178, TMRC30176, TMRC30231, TMRC30214, TMRC30255, TMRC30279.
## subset_expt(): There were 19, now there are 13 samples.
save(list = "tv3_monocytes", file = glue("rda/tmrc3_tv3_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_monocytes")
cpm_data <- normalize_expt(tv3_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9575 low-count genes (10377 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9575 low-count genes (10377 remaining).
## Setting 58 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9575 low-count genes (10377 remaining).
## Setting 28 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_monocytes_sva-v{ver}.csv"))tv1_neutrophils <- tcv1_neutrophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30186, TMRC30222, TMRC30149, TMRC30235, TMRC30226, TMRC30210, TMRC30275, TMRC30240, TMRC30284.
## subset_expt(): There were 25, now there are 16 samples.
save(list = "tv1_neutrophils", file = glue("rda/tmrc3_tv1_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_neutrophils")
cpm_data <- normalize_expt(tv1_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11235 low-count genes (8717 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11235 low-count genes (8717 remaining).
## Setting 145 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv1_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv1_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11235 low-count genes (8717 remaining).
## Setting 41 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv1_neutrophils_sva-v{ver}.csv"))tv2_neutrophils <- tcv2_neutrophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30224, TMRC30140, TMRC30229, TMRC30213, TMRC30276.
## subset_expt(): There were 18, now there are 13 samples.
save(list = "tv2_neutrophils", file = glue("rda/tmrc3_tv2_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_neutrophils")
cpm_data <- normalize_expt(tv2_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11500 low-count genes (8452 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11500 low-count genes (8452 remaining).
## Setting 78 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv2_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv2_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11500 low-count genes (8452 remaining).
## Setting 41 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv2_neutrophils_sva-v{ver}.csv"))tv3_neutrophils <- tcv3_neutrophils %>%
subset_expt(subset = "clinic=='tumaco'")## The samples excluded are: TMRC30179, TMRC30153, TMRC30232, TMRC30215, TMRC30256, TMRC30280, TMRC30285.
## subset_expt(): There were 19, now there are 12 samples.
save(list = "tv3_neutrophils", file = glue("rda/tmrc3_tv3_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_neutrophils")
cpm_data <- normalize_expt(tv3_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11447 low-count genes (8505 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11447 low-count genes (8505 remaining).
## Setting 83 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/tv3_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(tv3_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11447 low-count genes (8505 remaining).
## Setting 18 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/tv3_neutrophils_sva-v{ver}.csv"))Even though we are only considering the Tumaco samples, one might wish to perform comparisons among the Cali samples. The following blocks therefore separate them out in a similar fashion.
dir.create("cpm/4_Cali", recursive = TRUE)
c_clinical <- tc_clinical %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30071, TMRC30020, TMRC30056, TMRC30113, TMRC30105, TMRC30058, TMRC30164, TMRC30080, TMRC30094, TMRC30119, TMRC30082, TMRC30103, TMRC30122, TMRC30022, TMRC30169, TMRC30093, TMRC30029, TMRC30107, TMRC30170, TMRC30032, TMRC30096, TMRC30083, TMRC30028, TMRC30115, TMRC30118, TMRC30180, TMRC30014, TMRC30121, TMRC30196, TMRC30030, TMRC30021, TMRC30026, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30048, TMRC30054, TMRC30045, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30047, TMRC30191, TMRC30053, TMRC30041, TMRC30068, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30123, TMRC30181, TMRC30072, TMRC30133, TMRC30043, TMRC30078, TMRC30116, TMRC30184, TMRC30076, TMRC30159, TMRC30129, TMRC30088, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30161, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30168, TMRC30197, TMRC30146, TMRC30182, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30206, TMRC30136, TMRC30207, TMRC30238, TMRC30074, TMRC30217, TMRC30208, TMRC30077, TMRC30219, TMRC30218, TMRC30079, TMRC30220, TMRC30135, TMRC30173, TMRC30264, TMRC30144, TMRC30147, TMRC30265.
## subset_expt(): There were 184, now there are 61 samples.
save(list = "c_clinical", file = glue("rda/tmrc3_c_clinical-v{ver}.rda"))
data_structures <- c(data_structures, "c_clinical")
cpm_data <- normalize_expt(c_clinical, convert = "cpm")
c_clinical_cpm_variance <- variance_expt(cpm_data)
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_clinical-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_clinical, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 6553 low-count genes (13399 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/t_clinical-v{ver}.csv"))
cpm_data <- normalize_expt(c_clinical, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 6553 low-count genes (13399 remaining).
## Setting 5038 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_clinical_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_clinical, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 6553 low-count genes (13399 remaining).
## Setting 3274 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_clinical_sva-v{ver}.csv"))The Cali samples without biopsies.
c_clinical_nobiop <- subset_expt(c_clinical, subset = "typeofcells!='biopsy'")## The samples excluded are: TMRC30156, TMRC30269, TMRC30253, TMRC30270.
## subset_expt(): There were 61, now there are 57 samples.
save(list = "c_clinical_nobiop", file = glue("rda/tmrc3_c_clinical_nobiop-v{ver}.rda"))
data_structures <- c(data_structures, "c_clinical_nobiop")
cpm_data <- normalize_expt(c_clinical_nobiop, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_clinical_nobiop-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_clinical, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 6553 low-count genes (13399 remaining).
## Setting 3274 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_clinical_sva-v{ver}.csv"))
cpm_data <- normalize_expt(c_clinical_nobiop, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8355 low-count genes (11597 remaining).
## Setting 3411 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_clinical_nobiop_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_clinical_nobiop, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8355 low-count genes (11597 remaining).
## Setting 637 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_clinical_nobiop_sva-v{ver}.csv"))I added a few columns to the gene annotations reflecting the variance/stdev/mean expression of each gene. This really highlights the extraordinary degree to which genes are changing in the data…
top_expt <- normalize_expt(c_clinical_cpm_variance, filter = TRUE) %>%
variance_expt()## Removing 6223 low-count genes (13729 remaining).
top_fd <- fData(top_expt)
top_ex <- exprs(top_expt)
idx <- order(top_fd[["exprs_gene_stdev"]], decreasing = TRUE)
top_df <- top_fd[idx, ]
written <- write_xlsx(
data = head(top_fd, n = 100),
excel = glue("excel/c_clinical_cpm_stdev_top100-v{ver}.xlsx"))
bottom_expt <- normalize_expt(c_clinical_cpm_variance, filter = "simple", threshold = 500) %>%
variance_expt()## This function generally expects the threshold argument to be used with 'thresh'.
## Removing 11621 low-count genes (8331 remaining).
bottom_fd <- fData(bottom_expt)
bottom_ex <- exprs(bottom_expt)
idx <- order(bottom_fd[["exprs_gene_stdev"]])
bottom <- bottom_fd[idx, ]
written <- write_xlsx(
data = head(bottom_fd, n = 100),
excel = glue("excel/c_clinical_cpm_stdev_bottom100-v{ver}.xlsx"))This is copy/pasted from above and prints out how many of each sample type there are of each category.
table(pData(c_clinical)[["drug"]])##
## antimony
## 61
table(pData(c_clinical)[["clinic"]])##
## cali
## 61
table(pData(c_clinical)[["finaloutcome"]])##
## cure failure
## 55 6
table(pData(c_clinical)[["typeofcells"]])##
## biopsy eosinophils monocytes neutrophils
## 4 15 21 21
table(pData(c_clinical)[["visitnumber"]])##
## 3 2 1
## 17 15 29
summary(pData(c_clinical)[["Evolution_Time"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 6.0 8.0 10.5 15.0 20.0
summary(pData(c_clinical)[["Prescribed_Daily_Dose_Gluc"]])## Length Class Mode
## 61 character character
summary(pData(c_clinical)[["V3_Vertical_Axis_Lesion"]])## Length Class Mode
## 61 character character
summary(pData(c_clinical)[["V3_Total_Area_Lesion"]])## Length Class Mode
## 61 character character
summary(pData(c_clinical)[["V3_Horizontal_Axis_Ulcer"]])## Length Class Mode
## 61 character character
table(pData(c_clinical)[["Sex"]])##
## 1 2
## 55 6
table(pData(c_clinical)[["Ethnicity"]])##
## 1 2 3
## 15 27 19
summary(pData(c_clinical)[["Age"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.0 33.0 36.0 35.2 37.0 44.0
summary(pData(c_clinical)[["Height"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 155 160 169 167 174 174
summary(pData(c_clinical)[["Weight"]])## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.0 72.0 77.0 78.2 87.0 100.0
length(unique(pData(c_clinical)[["Patient_ID"]]))## [1] 10
only_cure <- pData(c_clinical)[["finaloutcome"]] == "cure"
c_meta <- pData(c_clinical)[only_cure, ]
length(unique(c_meta[["Patient_ID"]]))## [1] 9
only_fail <- pData(c_clinical)[["finaloutcome"]] == "failure"
f_meta <- pData(c_clinical)[only_fail, ]
length(unique(f_meta[["Patient_ID"]]))## [1] 1
cali_cure <- subset_expt(c_clinical, subset = "finaloutcome=='cure'")## The samples excluded are: TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224.
## subset_expt(): There were 61, now there are 55 samples.
table(pData(cali_cure)[["visitnumber"]])##
## 3 2 1
## 15 13 27
cali_fail <- subset_expt(c_clinical, subset = "finaloutcome=='failure'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30138, TMRC30176, TMRC30153, TMRC30151, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30227, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30285.
## subset_expt(): There were 61, now there are 6 samples.
table(pData(cali_fail)[["visitnumber"]])##
## 3 2 1
## 2 2 2
I previously made a bunch of data subsets by visit, cell type, etc. So let us overwrite them all with versions which contain only the Cali samples.
There is no going back to the Cali samples after this block unless we regenerate the data from the original expressionsets.
c_biopsies <- tc_biopsies %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 18, now there are 4 samples.
save(list = "c_biopsies", file = glue("rda/tmrc3_c_biopsies-v{ver}.rda"))
data_structures <- c(data_structures, "c_biopsies")
cpm_data <- normalize_expt(c_biopsies, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_biopsies-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_biopsies, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 7380 low-count genes (12572 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_biopsies-v{ver}.csv"))
cpm_data <- normalize_expt(c_biopsies, convert = "cpm", filter = TRUE)## Removing 7380 low-count genes (12572 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_biopsies_nosva-v{ver}.xlsx"))c_eosinophils <- tc_eosinophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30071, TMRC30113, TMRC30164, TMRC30119, TMRC30122, TMRC30029, TMRC30032, TMRC30028, TMRC30180, TMRC30196, TMRC30048, TMRC30054, TMRC30070, TMRC30043, TMRC30159, TMRC30161, TMRC30168, TMRC30182, TMRC30136, TMRC30074, TMRC30077, TMRC30079, TMRC30135, TMRC30173, TMRC30144, TMRC30147.
## subset_expt(): There were 41, now there are 15 samples.
save(list = "c_eosinophils", file = glue("rda/tmrc3_c_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "c_eosinophils")
cpm_data <- normalize_expt(c_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_eosinophils-v{ver}.xlsx"))
cpm_data <- normalize_expt(c_eosinophils, convert = "cpm", filter = TRUE)## Removing 9602 low-count genes (10350 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_eosinophils_nosva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_eosinophils, filter = TRUE) %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9602 low-count genes (10350 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_eosinophils_nosva-v{ver}.csv"))c_monocytes <- tc_monocytes %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30056, TMRC30105, TMRC30080, TMRC30082, TMRC30169, TMRC30107, TMRC30096, TMRC30115, TMRC30014, TMRC30030, TMRC30037, TMRC30165, TMRC30194, TMRC30046, TMRC30049, TMRC30055, TMRC30191, TMRC30041, TMRC30171, TMRC30139, TMRC30132, TMRC30157, TMRC30183, TMRC30123, TMRC30072, TMRC30078, TMRC30184, TMRC30129, TMRC30172, TMRC30174, TMRC30142, TMRC30145, TMRC30197, TMRC30199, TMRC30201, TMRC30203, TMRC30205, TMRC30237, TMRC30207, TMRC30217, TMRC30219, TMRC30264.
## subset_expt(): There were 63, now there are 21 samples.
save(list = "c_monocytes", file = glue("rda/tmrc3_c_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "c_monocytes")
cpm_data <- normalize_expt(c_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9380 low-count genes (10572 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(c_monocytes, convert = "cpm", filter = TRUE)## Removing 9380 low-count genes (10572 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_monocytes_nosva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9380 low-count genes (10572 remaining).
## Setting 18 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_monocytes_sva-v{ver}.csv"))c_neutrophils <- tc_neutrophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30058, TMRC30094, TMRC30103, TMRC30093, TMRC30170, TMRC30083, TMRC30118, TMRC30121, TMRC30021, TMRC30031, TMRC30027, TMRC30166, TMRC30195, TMRC30047, TMRC30053, TMRC30068, TMRC30192, TMRC30042, TMRC30158, TMRC30160, TMRC30167, TMRC30181, TMRC30133, TMRC30116, TMRC30076, TMRC30088, TMRC30134, TMRC30137, TMRC30175, TMRC30143, TMRC30146, TMRC30198, TMRC30200, TMRC30202, TMRC30204, TMRC30206, TMRC30238, TMRC30208, TMRC30218, TMRC30220, TMRC30265.
## subset_expt(): There were 62, now there are 21 samples.
save(list = "c_neutrophils", file = glue("rda/tmrc3_c_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "c_neutrophils")
cpm_data <- normalize_expt(c_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_neutrophil-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11521 low-count genes (8431 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(c_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11521 low-count genes (8431 remaining).
## Setting 118 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(c_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11521 low-count genes (8431 remaining).
## Setting 30 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/c_neutrophils_sva-v{ver}.csv"))cv1_samples <- tcv1_samples %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30071, TMRC30020, TMRC30080, TMRC30103, TMRC30022, TMRC30029, TMRC30107, TMRC30083, TMRC30180, TMRC30014, TMRC30021, TMRC30026, TMRC30165, TMRC30044, TMRC30166, TMRC30048, TMRC30045, TMRC30046, TMRC30047, TMRC30191, TMRC30041, TMRC30192, TMRC30042, TMRC30132, TMRC30167, TMRC30123, TMRC30043, TMRC30116, TMRC30184, TMRC30134, TMRC30174, TMRC30175, TMRC30168, TMRC30197, TMRC30198, TMRC30203, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30207, TMRC30238, TMRC30074, TMRC30208, TMRC30135, TMRC30264, TMRC30265.
## subset_expt(): There were 83, now there are 29 samples.
save(list = "cv1_samples", file = glue("rda/tmrc3_cv1_samples-v{ver}.rda"))
data_structures <- c(data_structures, "cv1_samples")
cpm_data <- normalize_expt(cv1_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 6689 low-count genes (13263 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_samples-v{ver}.csv"))
cpm_data <- normalize_expt(cv1_samples, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 6689 low-count genes (13263 remaining).
## Setting 1987 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 6689 low-count genes (13263 remaining).
## Setting 665 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_samples_sva-v{ver}.csv"))cv2_samples <- tcv2_samples %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30056, TMRC30113, TMRC30058, TMRC30119, TMRC30082, TMRC30093, TMRC30032, TMRC30096, TMRC30118, TMRC30196, TMRC30030, TMRC30031, TMRC30194, TMRC30195, TMRC30054, TMRC30049, TMRC30053, TMRC30171, TMRC30158, TMRC30157, TMRC30181, TMRC30072, TMRC30076, TMRC30159, TMRC30129, TMRC30137, TMRC30142, TMRC30143, TMRC30182, TMRC30199, TMRC30200, TMRC30217, TMRC30077, TMRC30218, TMRC30144.
## subset_expt(): There were 50, now there are 15 samples.
save(list = "cv2_samples", file = glue("rda/tmrc3_cv2_samples-v{ver}.rda"))
data_structures <- c(data_structures, "cv2_samples")
cpm_data <- normalize_expt(cv2_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8890 low-count genes (11062 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_samples-v{ver}.csv"))
cpm_data <- normalize_expt(cv2_samples, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8890 low-count genes (11062 remaining).
## Setting 345 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8890 low-count genes (11062 remaining).
## Setting 61 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_samples_sva-v{ver}.csv"))cv3_samples <- tcv3_samples %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30105, TMRC30164, TMRC30094, TMRC30122, TMRC30169, TMRC30170, TMRC30028, TMRC30115, TMRC30121, TMRC30037, TMRC30027, TMRC30070, TMRC30055, TMRC30068, TMRC30139, TMRC30160, TMRC30183, TMRC30133, TMRC30078, TMRC30088, TMRC30172, TMRC30161, TMRC30145, TMRC30146, TMRC30201, TMRC30202, TMRC30205, TMRC30206, TMRC30136, TMRC30219, TMRC30079, TMRC30220, TMRC30173, TMRC30147.
## subset_expt(): There were 51, now there are 17 samples.
save(list = "cv3_samples", file = glue("rda/tmrc3_cv3_samples-v{ver}.rda"))
data_structures <- c(data_structures, "cv3_samples")
cpm_data <- normalize_expt(cv3_samples, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_samples-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_samples, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 8862 low-count genes (11090 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_samples-v{ver}.csv"))
cpm_data <- normalize_expt(cv3_samples, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8862 low-count genes (11090 remaining).
## Setting 508 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_samples_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_samples, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 8862 low-count genes (11090 remaining).
## Setting 78 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_samples_sva-v{ver}.csv"))cv1_eosinophils <- tcv1_eosinophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30071, TMRC30029, TMRC30180, TMRC30048, TMRC30043, TMRC30168, TMRC30074, TMRC30135.
## subset_expt(): There were 14, now there are 6 samples.
save(list = "cv1_eosinophils", file = glue("rda/tmrc3_cv1_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv1_eosinophils")
cpm_data <- normalize_expt(cv1_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 10112 low-count genes (9840 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv1_eosinophils, convert = "cpm", filter = TRUE)## Removing 10112 low-count genes (9840 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_eosinophils_nosva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_eosinophils, filter = TRUE) %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 10112 low-count genes (9840 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_eosinophils_nosva-v{ver}.csv"))cv2_eosinophils <- tcv2_eosinophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30113, TMRC30119, TMRC30032, TMRC30196, TMRC30054, TMRC30159, TMRC30182, TMRC30077, TMRC30144.
## subset_expt(): There were 14, now there are 5 samples.
save(list = "cv2_eosinophils", file = glue("rda/tmrc3_cv2_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv2_eosinophils")
cpm_data <- normalize_expt(cv2_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 10131 low-count genes (9821 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv2_eosinophils, convert = "cpm", filter = TRUE)## Removing 10131 low-count genes (9821 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_eosinophils_nosva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_eosinophils, filter = TRUE) %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 10131 low-count genes (9821 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_eosinophils_nosva-v{ver}.csv"))cv3_eosinophils <- tcv3_eosinophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30164, TMRC30122, TMRC30028, TMRC30070, TMRC30161, TMRC30136, TMRC30079, TMRC30173, TMRC30147.
## subset_expt(): There were 13, now there are 4 samples.
save(list = "cv3_eosinophils", file = glue("rda/tmrc3_cv3_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv3_eosinophils")
cpm_data <- normalize_expt(cv3_eosinophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_eosinophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_eosinophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 10257 low-count genes (9695 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv3_eosinophils, convert = "cpm", filter = TRUE)## Removing 10257 low-count genes (9695 remaining).
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_eosinophils_nosva-v{ver}.xlsx"))
## Note, the cbcb filter left behind a sufficient number of zeros that it confused cpm.
rpkm_data <- normalize_expt(cv3_eosinophils, filter = "simple") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 5292 low-count genes (14660 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_eosinophils_nosva-v{ver}.csv"))cv1_monocytes <- tcv1_monocytes %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30080, TMRC30107, TMRC30014, TMRC30165, TMRC30046, TMRC30191, TMRC30041, TMRC30132, TMRC30123, TMRC30184, TMRC30174, TMRC30197, TMRC30203, TMRC30237, TMRC30207, TMRC30264.
## subset_expt(): There were 26, now there are 10 samples.
save(list = "cv1_monocytes", file = glue("rda/tmrc3_cv1_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "cv1_monocytes")
cpm_data <- normalize_expt(cv1_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9574 low-count genes (10378 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(cv1_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9574 low-count genes (10378 remaining).
## Setting 46 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9574 low-count genes (10378 remaining).
## Setting 9 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_monocytes_sva-v{ver}.csv"))cv2_monocytes <- tcv2_monocytes %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30056, TMRC30082, TMRC30096, TMRC30030, TMRC30194, TMRC30049, TMRC30171, TMRC30157, TMRC30072, TMRC30129, TMRC30142, TMRC30199, TMRC30217.
## subset_expt(): There were 18, now there are 5 samples.
save(list = "cv2_monocytes", file = glue("rda/tmrc3_cv2_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "cv2_monocytes")
cpm_data <- normalize_expt(cv2_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9817 low-count genes (10135 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(cv2_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9817 low-count genes (10135 remaining).
## Setting 5 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9817 low-count genes (10135 remaining).
## Setting 2 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_monocytes_sva-v{ver}.csv"))cv3_monocytes <- tcv3_monocytes %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30105, TMRC30169, TMRC30115, TMRC30037, TMRC30055, TMRC30139, TMRC30183, TMRC30078, TMRC30172, TMRC30145, TMRC30201, TMRC30205, TMRC30219.
## subset_expt(): There were 19, now there are 6 samples.
save(list = "cv3_monocytes", file = glue("rda/tmrc3_cv3_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "cv3_monocytes")
cpm_data <- normalize_expt(cv3_monocytes, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_monocytes-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_monocytes, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 9780 low-count genes (10172 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(cv3_monocytes, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 9780 low-count genes (10172 remaining).
## Setting 14 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_monocytes_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_monocytes, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 9780 low-count genes (10172 remaining).
## Setting 5 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_monocytes_sva-v{ver}.csv"))cv1_neutrophils <- tcv1_neutrophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30103, TMRC30083, TMRC30021, TMRC30166, TMRC30047, TMRC30192, TMRC30042, TMRC30167, TMRC30116, TMRC30134, TMRC30175, TMRC30198, TMRC30204, TMRC30238, TMRC30208, TMRC30265.
## subset_expt(): There were 25, now there are 9 samples.
save(list = "cv1_neutrophils", file = glue("rda/tmrc3_cv1_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv1_neutrophils")
cpm_data <- normalize_expt(cv1_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11830 low-count genes (8122 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv1_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11830 low-count genes (8122 remaining).
## Setting 33 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv1_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv1_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11830 low-count genes (8122 remaining).
## Setting 9 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv1_neutrophils_sva-v{ver}.csv"))cv2_neutrophils <- tcv2_neutrophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30058, TMRC30093, TMRC30118, TMRC30031, TMRC30195, TMRC30053, TMRC30158, TMRC30181, TMRC30076, TMRC30137, TMRC30143, TMRC30200, TMRC30218.
## subset_expt(): There were 18, now there are 5 samples.
save(list = "cv2_neutrophils", file = glue("rda/tmrc3_cv2_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv2_neutrophils")
cpm_data <- normalize_expt(cv2_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 12293 low-count genes (7659 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv2_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 12293 low-count genes (7659 remaining).
## Setting 10 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv2_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv2_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 12293 low-count genes (7659 remaining).
## Setting 1 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv2_neutrophils_sva-v{ver}.csv"))cv3_neutrophils <- tcv3_neutrophils %>%
subset_expt(subset = "clinic=='cali'")## The samples excluded are: TMRC30094, TMRC30170, TMRC30121, TMRC30027, TMRC30068, TMRC30160, TMRC30133, TMRC30088, TMRC30146, TMRC30202, TMRC30206, TMRC30220.
## subset_expt(): There were 19, now there are 7 samples.
save(list = "cv3_neutrophils", file = glue("rda/tmrc3_cv3_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "cv3_neutrophils")
cpm_data <- normalize_expt(cv3_neutrophils, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_neutrophils-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_neutrophils, filter = TRUE, convert = "rpkm",
na_to_zero = TRUE, column = "mean_cds_len")## Removing 11962 low-count genes (7990 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(cv3_neutrophils, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 11962 low-count genes (7990 remaining).
## Setting 28 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/cv3_neutrophils_sva-v{ver}.xlsx"))
rpkm_data <- normalize_expt(cv3_neutrophils, filter = TRUE, batch = "svaseq") %>%
normalize_expt(convert = "rpkm", column = "mean_cds_len", na_to_zero = TRUE)## Removing 11962 low-count genes (7990 remaining).
## Setting 7 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue("rpkm/cv3_neutrophils_sva-v{ver}.csv"))Is this redundant?
tc_visit <- set_expt_conditions(
tc_clinical, fact = "visitnumber", colors = color_choices[["visit2"]]) %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset = "typeofcells!='biopsy'")## The numbers of samples by condition are:
##
## 3 2 1
## 51 50 83
## The number of samples by batch are:
##
## cure failure
## 122 62
## The samples excluded are: TMRC30156, TMRC30269, TMRC30253, TMRC30270, TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 184, now there are 166 samples.
save(list = "tc_visit", file = glue("rda/tmrc3_tc_visit-v{ver}.rda"))
data_structures <- c(data_structures, "tc_visit")
tc_v1vs <- tc_visit
pData(tc_v1vs)[["visit"]] <- as.character(pData(tc_v1vs)[["visitnumber"]])
v1_samples <- pData(tc_v1vs)[["visit"]] == "1"
pData(tc_v1vs)[v1_samples, "visit"] <- "first"
pData(tc_v1vs)[!v1_samples, "visit"] <- "later"
pData(tc_v1vs)[["visit"]] <- as.factor(pData(tc_v1vs)[["visit"]])
tc_v1vs <- set_expt_conditions(
tc_v1vs, fact = "visit", colors = color_choices[["two_visits"]])## The numbers of samples by condition are:
##
## first later
## 65 101
save(list = "tc_v1vs", file = glue("rda/tmrc3_tc_v1vs-v{ver}.rda"))
data_structures <- c(data_structures, "tc_v1vs")
t_v1vs <- subset_expt(tc_v1vs, subset = "clinic == 'tumaco'")## The samples excluded are: TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30148, TMRC30149, TMRC30150, TMRC30140, TMRC30138, TMRC30176, TMRC30153, TMRC30151, TMRC30234, TMRC30235, TMRC30225, TMRC30226, TMRC30227, TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30232, TMRC30233, TMRC30209, TMRC30210, TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30215, TMRC30271, TMRC30273, TMRC30275, TMRC30272, TMRC30274, TMRC30276, TMRC30254, TMRC30255, TMRC30256, TMRC30277, TMRC30239, TMRC30240, TMRC30278, TMRC30279, TMRC30280, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30284, TMRC30282, TMRC30285.
## subset_expt(): There were 166, now there are 109 samples.
save(list = "t_v1vs", file = glue("rda/tmrc3_t_v1vs-v{ver}.rda"))
data_structures <- c(data_structures, "t_v1vs")
t_visit <- set_expt_conditions(
t_clinical, fact = "visitnumber", colors = color_choices[["visit2"]]) %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset = "typeofcells!='biopsy'")## The numbers of samples by condition are:
##
## 3 2 1
## 34 35 54
## The number of samples by batch are:
##
## cure failure
## 67 56
## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30022, TMRC30026, TMRC30044, TMRC30045, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241.
## subset_expt(): There were 123, now there are 109 samples.
save(list = "t_visit", file = glue("rda/tmrc3_t_visit-v{ver}.rda"))
data_structures <- c(data_structures, "t_visit")
cpm_data <- normalize_expt(t_visit, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_visit-v{ver}.xlsx"))
cpm_data <- normalize_expt(t_visit, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8042 low-count genes (11910 remaining).
## Setting 9636 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/t_visit_sva-v{ver}.xlsx"))visit_cf_expt_factor <- paste0("v", pData(t_clinical)[["visitnumber"]],
pData(t_clinical)[["finaloutcome"]])
t_visitcf <- set_expt_conditions(t_clinical, fact = visit_cf_expt_factor)## The numbers of samples by condition are:
##
## v1cure v1failure v2cure v2failure v3cure v3failure
## 30 24 20 15 17 17
save(list = "t_visitcf", file = glue("rda/tmrc3_t_visitcf-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf")
t_visitcf_eosinophil <- subset_expt(t_visitcf, subset = "typeofcells=='eosinophils'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30020, TMRC30056, TMRC30105, TMRC30058, TMRC30080, TMRC30094, TMRC30082, TMRC30103, TMRC30022, TMRC30169, TMRC30093, TMRC30107, TMRC30170, TMRC30096, TMRC30083, TMRC30115, TMRC30118, TMRC30014, TMRC30121, TMRC30030, TMRC30021, TMRC30026, TMRC30037, TMRC30031, TMRC30165, TMRC30027, TMRC30044, TMRC30194, TMRC30166, TMRC30195, TMRC30045, TMRC30046, TMRC30049, TMRC30055, TMRC30047, TMRC30191, TMRC30053, TMRC30041, TMRC30068, TMRC30171, TMRC30192, TMRC30139, TMRC30042, TMRC30158, TMRC30132, TMRC30160, TMRC30157, TMRC30183, TMRC30167, TMRC30123, TMRC30181, TMRC30072, TMRC30133, TMRC30078, TMRC30116, TMRC30184, TMRC30076, TMRC30129, TMRC30088, TMRC30172, TMRC30134, TMRC30174, TMRC30137, TMRC30142, TMRC30175, TMRC30145, TMRC30143, TMRC30197, TMRC30146, TMRC30199, TMRC30198, TMRC30201, TMRC30200, TMRC30203, TMRC30202, TMRC30205, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30206, TMRC30207, TMRC30238, TMRC30217, TMRC30208, TMRC30219, TMRC30218, TMRC30220, TMRC30264, TMRC30265.
## subset_expt(): There were 123, now there are 26 samples.
save(list = "t_visitcf_eosinophil", file = glue("rda/tmrc3_t_visitcf_eosinophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_eosinophil")
t_visitcf_monocyte <- subset_expt(t_visitcf, subset = "typeofcells=='monocytes'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30071, TMRC30020, TMRC30113, TMRC30058, TMRC30164, TMRC30094, TMRC30119, TMRC30103, TMRC30122, TMRC30022, TMRC30093, TMRC30029, TMRC30170, TMRC30032, TMRC30083, TMRC30028, TMRC30118, TMRC30180, TMRC30121, TMRC30196, TMRC30021, TMRC30026, TMRC30031, TMRC30027, TMRC30044, TMRC30166, TMRC30195, TMRC30048, TMRC30054, TMRC30045, TMRC30070, TMRC30047, TMRC30053, TMRC30068, TMRC30192, TMRC30042, TMRC30158, TMRC30160, TMRC30167, TMRC30181, TMRC30133, TMRC30043, TMRC30116, TMRC30076, TMRC30159, TMRC30088, TMRC30134, TMRC30137, TMRC30161, TMRC30175, TMRC30143, TMRC30168, TMRC30146, TMRC30182, TMRC30198, TMRC30200, TMRC30202, TMRC30204, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30206, TMRC30136, TMRC30238, TMRC30074, TMRC30208, TMRC30077, TMRC30218, TMRC30079, TMRC30220, TMRC30135, TMRC30173, TMRC30144, TMRC30147, TMRC30265.
## subset_expt(): There were 123, now there are 42 samples.
save(list = "t_visitcf_monocyte", file = glue("rda/tmrc3_t_visitcf_monocyte-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_monocyte")
t_visitcf_neutrophil <- subset_expt(t_visitcf, subset = "typeofcells=='neutrophils'")## The samples excluded are: TMRC30016, TMRC30017, TMRC30018, TMRC30019, TMRC30071, TMRC30020, TMRC30056, TMRC30113, TMRC30105, TMRC30164, TMRC30080, TMRC30119, TMRC30082, TMRC30122, TMRC30022, TMRC30169, TMRC30029, TMRC30107, TMRC30032, TMRC30096, TMRC30028, TMRC30115, TMRC30180, TMRC30014, TMRC30196, TMRC30030, TMRC30026, TMRC30037, TMRC30165, TMRC30044, TMRC30194, TMRC30048, TMRC30054, TMRC30045, TMRC30046, TMRC30070, TMRC30049, TMRC30055, TMRC30191, TMRC30041, TMRC30171, TMRC30139, TMRC30132, TMRC30157, TMRC30183, TMRC30123, TMRC30072, TMRC30043, TMRC30078, TMRC30184, TMRC30159, TMRC30129, TMRC30172, TMRC30174, TMRC30161, TMRC30142, TMRC30145, TMRC30168, TMRC30197, TMRC30182, TMRC30199, TMRC30201, TMRC30203, TMRC30205, TMRC30152, TMRC30177, TMRC30155, TMRC30154, TMRC30241, TMRC30237, TMRC30136, TMRC30207, TMRC30074, TMRC30217, TMRC30077, TMRC30219, TMRC30079, TMRC30135, TMRC30173, TMRC30264, TMRC30144, TMRC30147.
## subset_expt(): There were 123, now there are 41 samples.
save(list = "t_visitcf_neutrophil", file = glue("rda/tmrc3_t_visitcf_neutrophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_neutrophil")Is this redundant?
c_visit <- set_expt_conditions(c_clinical, fact = "visitnumber") %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset = "typeofcells!='biopsy'")## The numbers of samples by condition are:
##
## 3 2 1
## 17 15 29
## The number of samples by batch are:
##
## cure failure
## 55 6
## The samples excluded are: TMRC30156, TMRC30269, TMRC30253, TMRC30270.
## subset_expt(): There were 61, now there are 57 samples.
save(list = "c_visit", file = glue("rda/tmrc3_c_visit-v{ver}.rda"))
data_structures <- c(data_structures, "c_visit")
cpm_data <- normalize_expt(c_visit, convert = "cpm")
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Tumaco/c_visit-v{ver}.xlsx"))
cpm_data <- normalize_expt(c_visit, convert = "cpm", filter = TRUE, batch = "svaseq")## Removing 8355 low-count genes (11597 remaining).
## Setting 3410 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel = glue("cpm/4_Cali/c_visit_sva-v{ver}.xlsx"))visit_cf_expt_factor <- paste0("v", pData(c_clinical)[["visitnumber"]],
pData(c_clinical)[["finaloutcome"]])
c_visitcf <- set_expt_conditions(c_clinical, fact = visit_cf_expt_factor)## The numbers of samples by condition are:
##
## v1cure v1failure v2cure v2failure v3cure v3failure
## 27 2 13 2 15 2
save(list = "c_visitcf", file = glue("rda/tmrc3_c_visitcf-v{ver}.rda"))
data_structures <- c(data_structures, "c_visitcf")
c_visitcf_eosinophil <- subset_expt(c_visitcf, subset = "typeofcells=='eosinophils'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30186, TMRC30178, TMRC30179, TMRC30221, TMRC30222, TMRC30223, TMRC30224, TMRC30269, TMRC30148, TMRC30149, TMRC30253, TMRC30150, TMRC30140, TMRC30176, TMRC30153, TMRC30234, TMRC30235, TMRC30270, TMRC30225, TMRC30226, TMRC30228, TMRC30229, TMRC30231, TMRC30232, TMRC30209, TMRC30210, TMRC30212, TMRC30213, TMRC30214, TMRC30215, TMRC30273, TMRC30275, TMRC30274, TMRC30276, TMRC30255, TMRC30256, TMRC30239, TMRC30240, TMRC30279, TMRC30280, TMRC30258, TMRC30283, TMRC30284, TMRC30285.
## subset_expt(): There were 61, now there are 15 samples.
save(list = "c_visitcf_eosinophil", file = glue("rda/tmrc3_c_visitcf_eosinophil-v{ver}.rda"))
data_structures <- c(data_structures, "c_visitcf_eosinophil")
c_visitcf_monocyte <- subset_expt(c_visitcf, subset = "typeofcells=='monocytes'")## The samples excluded are: TMRC30156, TMRC30186, TMRC30179, TMRC30222, TMRC30224, TMRC30269, TMRC30149, TMRC30253, TMRC30140, TMRC30138, TMRC30153, TMRC30151, TMRC30235, TMRC30270, TMRC30226, TMRC30227, TMRC30229, TMRC30230, TMRC30232, TMRC30233, TMRC30210, TMRC30211, TMRC30213, TMRC30216, TMRC30215, TMRC30271, TMRC30275, TMRC30272, TMRC30276, TMRC30254, TMRC30256, TMRC30277, TMRC30240, TMRC30278, TMRC30280, TMRC30257, TMRC30281, TMRC30284, TMRC30282, TMRC30285.
## subset_expt(): There were 61, now there are 21 samples.
save(list = "c_visitcf_monocyte", file = glue("rda/tmrc3_c_visitcf_monocyte-v{ver}.rda"))
data_structures <- c(data_structures, "c_visitcf_monocyte")
c_visitcf_neutrophil <- subset_expt(c_visitcf, subset = "typeofcells=='neutrophils'")## The samples excluded are: TMRC30156, TMRC30185, TMRC30178, TMRC30221, TMRC30223, TMRC30269, TMRC30148, TMRC30253, TMRC30150, TMRC30138, TMRC30176, TMRC30151, TMRC30234, TMRC30270, TMRC30225, TMRC30227, TMRC30228, TMRC30230, TMRC30231, TMRC30233, TMRC30209, TMRC30211, TMRC30212, TMRC30216, TMRC30214, TMRC30271, TMRC30273, TMRC30272, TMRC30274, TMRC30254, TMRC30255, TMRC30277, TMRC30239, TMRC30278, TMRC30279, TMRC30257, TMRC30258, TMRC30281, TMRC30283, TMRC30282.
## subset_expt(): There were 61, now there are 21 samples.
save(list = "c_visitcf_neutrophil", file = glue("rda/tmrc3_c_visitcf_neutrophil-v{ver}.rda"))
data_structures <- c(data_structures, "c_visitcf_neutrophil")ncol(exprs(tc_clinical))## [1] 184
sum(pData(tc_clinical)[["finaloutcome"]] == "cure")## [1] 122
sum(pData(tc_clinical)[["finaloutcome"]] == "failure")## [1] 62
all_types[["biopsy"]]## [1] 18
sum(pData(tc_biopsies)[["finaloutcome"]] == "cure")## [1] 13
sum(pData(tc_biopsies)[["finaloutcome"]] == "failure")## [1] 5
all_types[["eosinophils"]]## [1] 41
sum(pData(tc_eosinophils)[["finaloutcome"]] == "cure")## [1] 32
sum(pData(tc_eosinophils)[["finaloutcome"]] == "failure")## [1] 9
nrow(pData(tcv1_eosinophils))## [1] 14
sum(pData(tcv1_eosinophils)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv1_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(tcv2_eosinophils))## [1] 14
sum(pData(tcv2_eosinophils)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv2_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(tcv3_eosinophils))## [1] 13
sum(pData(tcv3_eosinophils)[["finaloutcome"]] == "cure")## [1] 10
sum(pData(tcv3_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
all_types[["monocytes"]]## [1] 63
sum(pData(tc_monocytes)[["finaloutcome"]] == "cure")## [1] 39
sum(pData(tc_monocytes)[["finaloutcome"]] == "failure")## [1] 24
nrow(pData(tcv1_monocytes))## [1] 26
sum(pData(tcv1_monocytes)[["finaloutcome"]] == "cure")## [1] 17
sum(pData(tcv1_monocytes)[["finaloutcome"]] == "failure")## [1] 9
nrow(pData(tcv2_monocytes))## [1] 18
sum(pData(tcv2_monocytes)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv2_monocytes)[["finaloutcome"]] == "failure")## [1] 7
nrow(pData(tcv3_monocytes))## [1] 19
sum(pData(tcv3_monocytes)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv3_monocytes)[["finaloutcome"]] == "failure")## [1] 8
all_types[["neutrophils"]]## [1] 62
sum(pData(tc_neutrophils)[["finaloutcome"]] == "cure")## [1] 38
sum(pData(tc_neutrophils)[["finaloutcome"]] == "failure")## [1] 24
nrow(pData(tcv1_neutrophils))## [1] 25
sum(pData(tcv1_neutrophils)[["finaloutcome"]] == "cure")## [1] 16
sum(pData(tcv1_neutrophils)[["finaloutcome"]] == "failure")## [1] 9
nrow(pData(tcv2_monocytes))## [1] 18
sum(pData(tcv2_neutrophils)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv2_neutrophils)[["finaloutcome"]] == "failure")## [1] 7
nrow(pData(tcv3_neutrophils))## [1] 19
sum(pData(tcv3_neutrophils)[["finaloutcome"]] == "cure")## [1] 11
sum(pData(tcv3_neutrophils)[["finaloutcome"]] == "failure")## [1] 8
Here is an outline of the samples in their current state:
ncol(exprs(t_clinical))## [1] 123
sum(pData(t_clinical)[["finaloutcome"]] == "cure")## [1] 67
sum(pData(t_clinical)[["finaloutcome"]] == "failure")## [1] 56
all_types[["biopsy"]]## [1] 18
sum(pData(t_biopsies)[["finaloutcome"]] == "cure")## [1] 9
sum(pData(t_biopsies)[["finaloutcome"]] == "failure")## [1] 5
all_types[["eosinophils"]]## [1] 41
sum(pData(t_eosinophils)[["finaloutcome"]] == "cure")## [1] 17
sum(pData(t_eosinophils)[["finaloutcome"]] == "failure")## [1] 9
nrow(pData(tv1_eosinophils))## [1] 8
sum(pData(tv1_eosinophils)[["finaloutcome"]] == "cure")## [1] 5
sum(pData(tv1_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(tv2_eosinophils))## [1] 9
sum(pData(tv2_eosinophils)[["finaloutcome"]] == "cure")## [1] 6
sum(pData(tv2_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(tv3_eosinophils))## [1] 9
sum(pData(tv3_eosinophils)[["finaloutcome"]] == "cure")## [1] 6
sum(pData(tv3_eosinophils)[["finaloutcome"]] == "failure")## [1] 3
all_types[["monocytes"]]## [1] 63
sum(pData(t_monocytes)[["finaloutcome"]] == "cure")## [1] 21
sum(pData(t_monocytes)[["finaloutcome"]] == "failure")## [1] 21
nrow(pData(tv1_monocytes))## [1] 16
sum(pData(tv1_monocytes)[["finaloutcome"]] == "cure")## [1] 8
sum(pData(tv1_monocytes)[["finaloutcome"]] == "failure")## [1] 8
nrow(pData(tv2_monocytes))## [1] 13
sum(pData(tv2_monocytes)[["finaloutcome"]] == "cure")## [1] 7
sum(pData(tv2_monocytes)[["finaloutcome"]] == "failure")## [1] 6
nrow(pData(tv3_monocytes))## [1] 13
sum(pData(tv3_monocytes)[["finaloutcome"]] == "cure")## [1] 6
sum(pData(tv3_monocytes)[["finaloutcome"]] == "failure")## [1] 7
all_types[["neutrophils"]]## [1] 62
sum(pData(t_neutrophils)[["finaloutcome"]] == "cure")## [1] 20
sum(pData(t_neutrophils)[["finaloutcome"]] == "failure")## [1] 21
nrow(pData(tv1_neutrophils))## [1] 16
sum(pData(tv1_neutrophils)[["finaloutcome"]] == "cure")## [1] 8
sum(pData(tv1_neutrophils)[["finaloutcome"]] == "failure")## [1] 8
nrow(pData(tv2_monocytes))## [1] 13
sum(pData(tv2_neutrophils)[["finaloutcome"]] == "cure")## [1] 7
sum(pData(tv2_neutrophils)[["finaloutcome"]] == "failure")## [1] 6
nrow(pData(tv3_neutrophils))## [1] 12
sum(pData(tv3_neutrophils)[["finaloutcome"]] == "cure")## [1] 5
sum(pData(tv3_neutrophils)[["finaloutcome"]] == "failure")## [1] 7
ncol(exprs(c_clinical))## [1] 61
sum(pData(c_clinical)[["finaloutcome"]] == "cure")## [1] 55
sum(pData(c_clinical)[["finaloutcome"]] == "failure")## [1] 6
#all_types[["biopsy"]]
sum(pData(c_biopsies)[["finaloutcome"]] == "cure")## [1] 4
sum(pData(c_biopsies)[["finaloutcome"]] == "failure")## [1] 0
#all_types[["eosinophils"]]
sum(pData(c_eosinophils)[["finaloutcome"]] == "cure")## [1] 15
sum(pData(c_eosinophils)[["finaloutcome"]] == "failure")## [1] 0
nrow(pData(cv1_eosinophils))## [1] 6
sum(pData(cv1_eosinophils)[["finaloutcome"]] == "cure")## [1] 6
sum(pData(cv1_eosinophils)[["finaloutcome"]] == "failure")## [1] 0
nrow(pData(cv2_eosinophils))## [1] 5
sum(pData(cv2_eosinophils)[["finaloutcome"]] == "cure")## [1] 5
sum(pData(cv2_eosinophils)[["finaloutcome"]] == "failure")## [1] 0
nrow(pData(cv3_eosinophils))## [1] 4
sum(pData(cv3_eosinophils)[["finaloutcome"]] == "cure")## [1] 4
sum(pData(cv3_eosinophils)[["finaloutcome"]] == "failure")## [1] 0
#all_types[["monocytes"]]
sum(pData(c_monocytes)[["finaloutcome"]] == "cure")## [1] 18
sum(pData(c_monocytes)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(cv1_monocytes))## [1] 10
sum(pData(cv1_monocytes)[["finaloutcome"]] == "cure")## [1] 9
sum(pData(cv1_monocytes)[["finaloutcome"]] == "failure")## [1] 1
nrow(pData(cv2_monocytes))## [1] 5
sum(pData(cv2_monocytes)[["finaloutcome"]] == "cure")## [1] 4
sum(pData(cv2_monocytes)[["finaloutcome"]] == "failure")## [1] 1
nrow(pData(cv3_monocytes))## [1] 6
sum(pData(cv3_monocytes)[["finaloutcome"]] == "cure")## [1] 5
sum(pData(cv3_monocytes)[["finaloutcome"]] == "failure")## [1] 1
#all_types[["neutrophils"]]
sum(pData(c_neutrophils)[["finaloutcome"]] == "cure")## [1] 18
sum(pData(c_neutrophils)[["finaloutcome"]] == "failure")## [1] 3
nrow(pData(cv1_neutrophils))## [1] 9
sum(pData(cv1_neutrophils)[["finaloutcome"]] == "cure")## [1] 8
sum(pData(cv1_neutrophils)[["finaloutcome"]] == "failure")## [1] 1
nrow(pData(cv2_monocytes))## [1] 5
sum(pData(cv2_neutrophils)[["finaloutcome"]] == "cure")## [1] 4
sum(pData(cv2_neutrophils)[["finaloutcome"]] == "failure")## [1] 1
nrow(pData(cv3_neutrophils))## [1] 7
sum(pData(cv3_neutrophils)[["finaloutcome"]] == "cure")## [1] 6
sum(pData(cv3_neutrophils)[["finaloutcome"]] == "failure")## [1] 1
One potentially interesting comparisons point, especially vis a vis the Biopsy samples, is to compare with another set of recent L. braziliensis infected biopsy samples. The following sets up an expressionset using the raw data from bioproject PRJNA525604.
Note 202409: I just added the scott tsv metadata to this container and thus to this data structure; this only works because the samples are nicely in the same order.
new_sample_sheet <- sm(
suppressWarnings(gather_preprocessing_metadata(
"sample_sheets/scott_sra_samples.xlsx", specification = make_rnaseq_spec())))
scott_metadata <- as.data.frame(readr::read_tsv("sample_sheets/scott_capsule_studydesign.tsv"))## Rows: 28 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: "\t"
## chr (8): sample, disease, treatment_outcome, age, sex, DTH, lesion_size, Tim...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
new_meta <- cbind.data.frame(new_sample_sheet[["new_meta"]], scott_metadata)
external_cf <- create_expt(new_meta,
gene_info = hs_annot, file_column = "hisatcounttable") %>%
set_expt_conditions(fact = "treatmentoutcome") %>%
set_expt_batches(fact = "sex") %>%
subset_expt(subset = "condition!='na'")## Reading the sample metadata.
## The sample definitions comprises: 28 rows(samples) and 47 columns(metadata fields).
## Matched 21476 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 21481 features and 28 samples.
## The numbers of samples by condition are:
##
## cure failure na
## 14 7 7
## The number of samples by batch are:
##
## female male na
## 4 17 7
## The samples excluded are: SRR8668755, SRR8668756, SRR8668757, SRR8668758, SRR8668759, SRR8668760, SRR8668761.
## subset_expt(): There were 28, now there are 21 samples.
pData(external_cf)[["ParasiteSpecies"]] <- "lvbraziliensis"
save(list = "external_cf", file = glue("rda/tmrc3_external_cf-v{ver}.rda"))
data_structures <- c(data_structures, "external_cf")
biopsy_cf <- set_expt_conditions(tc_biopsies, fact = "finaloutcome")## The numbers of samples by condition are:
##
## cure failure
## 13 5
pData(biopsy_cf)[["lab"]] <- "Colombia"
pData(external_cf)[["lab"]] <- "Brazil"
pData(external_cf)[["finaloutcome"]] <- pData(external_cf)[["treatmentoutcome"]]
tmrc3_external <- combine_expts(external_cf, biopsy_cf) %>%
set_expt_conditions(fact = "lab", colors = color_choices[["labs"]]) %>%
set_expt_batches(fact = "finaloutcome")## The numbers of samples by condition are:
##
## failure cure
## 12 27
## Warning in combine_expts(external_cf, biopsy_cf): There appear to be NA values
## in the new expressionset. Beware.
## The numbers of samples by condition are:
##
## Brazil Colombia
## 21 18
## The number of samples by batch are:
##
## failure cure
## 12 27
save(list = "tmrc3_external", file = glue("rda/tmrc3_external-v{ver}.rda"))
data_structures <- c(data_structures, "tmrc3_external")
tmrc3_external_species <- set_expt_conditions(
tmrc3_external, fact = "ParasiteSpecies", colors = color_choices[["parasite"]]) %>%
set_expt_batches(fact = "lab")## The numbers of samples by condition are:
##
## lvbraziliensis lvpanamensis notapplicable
## 22 14 3
## Warning in set_expt_colors(new_expt, colors = colors): Colors for the following
## categories are not being used: lvguyanensis.
## The number of samples by batch are:
##
## Brazil Colombia
## 21 18
save(list = "tmrc3_external_species", file = glue("rda/tmrc3_external_sepcies-v{ver}.rda"))
data_structures <- c(data_structures, "tmrc3_external_species")
## In my singularity container, this fails with a pthread problem, not doing quant here.
#tt <- normalize_expt(tmrc3_external_species, filter = TRUE, transform = "log2",
# convert = "cpm", norm = "quant")
tt <- normalize_expt(tmrc3_external_species, filter = TRUE, transform = "log2",
convert = "cpm", norm = "tmm")## Removing 6904 low-count genes (14577 remaining).
## transform_counts: Found 7483 values equal to 0, adding 1 to the matrix.
plot_pca(tt)## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by lvbraziliensis, lvpanamensis, notapplicable
## Shapes are defined by Brazil, Colombia.
confused <- set_expt_conditions(
hs_expt, fact = "ParasiteSpecies", colors = color_choices[["parasite"]])## The numbers of samples by condition are:
##
## lvbraziliensis lvguyanensis lvpanamensis notapplicable
## 15 2 114 79
tc_etnia_expt <- set_expt_conditions(
tc_clinical, fact = "etnia", colors = color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 91 46 47
save(list = "tc_etnia_expt", file = glue("rda/tmrc3_tc_etnia-v{ver}.rda"))
data_structures <- c(data_structures, "tc_etnia_expt")
t_etnia_expt <- set_expt_conditions(
t_clinical, fact = "etnia", colors = color_choices[["ethnicity"]])## The numbers of samples by condition are:
##
## afrocol indigena mestiza
## 76 19 28
save(list = "t_etnia_expt", file = glue("rda/tmrc3_t_etnia-v{ver}.rda"))
data_structures <- c(data_structures, "t_etnia_expt")haha I knew I should not have removed the following block. In previous iterations of this I included a datastructure which pulled in the parasite reads from all samples with ‘sufficient’ coverage. This is not very many samples, for the simple reason that the drug treatment basically works! As a result, we removed the following section from the container (also for space). Oh, I may need to copy some more count tables into the container recipe for this to work.
It should be noted (if anyone ever reads this), that I also did a version of all of these analyses where I explicitly made a concatenated database (genome/gff/CDS) of human and parasite references (e.g. for salmon/hisat/bwa/etc) and did all of this using it in order to satisfy myself that the various treatments are similar enough to be called the same (I just saw that block while hunting for this parasite data and thought it was worth noting).
In any event, the following is copy/pasted from my working tree:
Make an expressionset of the parasite reads in the TMRC3 samples and distinguish between the faux and real reads. E.g: Are there samples which definitively contain parasites?
Later, I manually went through the mappings of samples with a significant number of parasite reads in IGV with some TMRC2 known zymodeme samples. In many cases it is possible to state definitively the classification of the parasite which infected an individual.
Note: I am renaming the resulting data structures a little because we generally do not call things ‘tmrc2/tmrc3/etc’ in the container.
lp_expt <- create_expt(
samplesheet,
file_column = "lpanamensisv36hisatfile", gene_info = all_lp_annot) %>%
subset_expt(coverage = 30000) %>%
subset_expt(nonzero = 3000) %>%
set_expt_conditions(fact = "typeofcells") %>%
subset_expt(subset = "clinic!='undefined'") %>%
sanitize_expt_pData("finaloutcome")## Reading the sample metadata.
## Error in eval(expr, envir, enclos) :
## basic_string::substr: __pos (which is 8) > this->size() (which is 5)
## Did not find the column: sampleid.
## Setting the ID column to the first column.
## Dropped 69 rows from the sample metadata because the sample ID is blank.
## Warning in extract_metadata(metadata, id_column = id_column, ...): There were
## NA values in the condition column, setting them to 'undefined'.
## Warning in extract_metadata(metadata, id_column = id_column, ...): There were
## NA values in the condition column, setting them to 'undefined'.
## The sample definitions comprises: 254 rows(samples) and 88 columns(metadata fields).
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30156/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30185/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30186/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30178/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30179/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30221/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30222/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30223/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30224/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30269/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30148/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30149/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30253/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30150/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30140/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30138/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30176/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30153/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30151/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30234/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30235/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30270/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30225/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30226/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30227/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30228/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30229/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30230/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30231/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30232/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30233/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30209/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30210/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30211/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30212/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30213/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30216/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30214/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30215/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30271/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30273/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30275/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30272/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30274/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30276/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30254/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30255/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30256/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30277/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30239/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30240/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30278/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30279/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30280/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30257/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30258/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30281/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30283/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30284/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30282/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30050/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30285/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30071/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30056/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30052/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30113/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30105/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30058/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30164/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30080/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30094/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30119/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30059/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30060/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30061/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30062/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30063/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30051/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30064/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30065/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30162/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30066/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30067/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30117/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30057/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30069/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30082/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30103/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30122/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30169/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30093/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30107/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30170/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30096/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30083/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30115/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30118/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30180/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30121/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30196/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30165/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30194/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30166/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30195/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30048/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30054/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30046/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30070/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30049/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30055/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30047/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30191/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30053/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30068/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30171/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30192/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30139/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30188/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30158/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30132/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30160/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30157/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30189/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30183/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30167/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30123/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30181/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30072/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30133/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30078/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30116/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30184/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30076/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30159/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30129/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30088/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30172/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30134/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30174/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30137/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30161/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30142/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30175/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30190/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30145/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30143/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30168/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30197/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30146/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30182/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30199/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30198/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30266/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30268/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30286/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30249/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30252/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30250/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30251/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30245/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30246/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30247/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30248/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30244/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30201/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30200/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30203/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30202/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30205/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30204/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30152/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30177/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30155/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30154/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30241/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30242/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30237/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30206/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30136/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30207/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30238/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30074/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30217/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30208/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30077/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30219/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30218/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30079/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30260/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30220/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30262/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30261/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30135/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30173/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30264/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30263/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30144/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30147/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in first_rownames != current_rownames: longer object length is not a
## multiple of shorter object length
## Warning in read_counts_expt(sample_ids, filenames, countdir = countdir, : The
## file:
## /mnt/nfs/z1/sw/singularity/cure_fail_host_analyses/202411041814_outputs/preprocessing/TMRC30265/outputs/03hisat2_lpanamensis_v36/sno_gene_gene_id.count.xz
## has mismatched rownames.
## Warning in create_expt(samplesheet, file_column = "lpanamensisv36hisatfile", :
## Some samples were removed when cross referencing the samples against the count
## data.
## Warning in create_expt(samplesheet, file_column = "lpanamensisv36hisatfile", :
## There are some NAs in this data, the 'handle_nas' parameter may be required.
## Warning in create_expt(samplesheet, file_column = "lpanamensisv36hisatfile", :
## The following samples have no counts: TMRC30010, TMRC30185, TMRC30221,
## TMRC30222, TMRC30223, TMRC30234, TMRC30235, TMRC30225, TMRC30226, TMRC30227,
## TMRC30228, TMRC30229, TMRC30230, TMRC30231, TMRC30233, TMRC30209, TMRC30210,
## TMRC30211, TMRC30212, TMRC30213, TMRC30216, TMRC30214, TMRC30272, TMRC30274,
## TMRC30278, TMRC30279, TMRC30285, TMRC30145, TMRC30146, TMRC30207, TMRC30217,
## TMRC30219, TMRC30220, TMRC30144
## If the handle_na parameter is 'drop', this will result in an empty dataset.
## Matched 8778 annotations and counts.
## Bringing together the count matrix and gene information.
## Warning in create_expt(samplesheet, file_column = "lpanamensisv36hisatfile", :
## The following samples have no counts!
## TMRC30010TMRC30185TMRC30221TMRC30222TMRC30223TMRC30234TMRC30235TMRC30225TMRC30226TMRC30227TMRC30228TMRC30229TMRC30230TMRC30231TMRC30233TMRC30209TMRC30210TMRC30211TMRC30212TMRC30213TMRC30216TMRC30214TMRC30272TMRC30274TMRC30278TMRC30279TMRC30285TMRC30145TMRC30146TMRC30207TMRC30217TMRC30219TMRC30220TMRC30144
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 8778 features and 244 samples.
## The samples removed (and read coverage) when filtering samples with less than 30000 reads are:
## TMRC30001 TMRC30002 TMRC30003 TMRC30004 TMRC30005 TMRC30006 TMRC30007 TMRC30008
## 12 8 9 16 25 29 3 16
## TMRC30009 TMRC30010 TMRC30015 TMRC30011 TMRC30012 TMRC30013 TMRC30185 TMRC30186
## 16 0 1476 5 9 13 0 852
## TMRC30178 TMRC30179 TMRC30221 TMRC30222 TMRC30223 TMRC30224 TMRC30269 TMRC30148
## 420 515 0 0 0 2 58 706
## TMRC30149 TMRC30150 TMRC30140 TMRC30138 TMRC30176 TMRC30153 TMRC30151 TMRC30234
## 4713 470 153 156 1158 1497 480 0
## TMRC30235 TMRC30270 TMRC30225 TMRC30226 TMRC30227 TMRC30228 TMRC30229 TMRC30230
## 0 21 0 0 0 0 0 0
## TMRC30231 TMRC30232 TMRC30233 TMRC30018 TMRC30209 TMRC30210 TMRC30211 TMRC30212
## 0 1 0 110 0 0 0 0
## TMRC30213 TMRC30216 TMRC30214 TMRC30215 TMRC30271 TMRC30273 TMRC30275 TMRC30272
## 0 0 0 1 4 2 2 0
## TMRC30274 TMRC30276 TMRC30254 TMRC30255 TMRC30256 TMRC30277 TMRC30239 TMRC30240
## 0 1 208 8435 15767 6 209 314
## TMRC30278 TMRC30279 TMRC30280 TMRC30257 TMRC30019 TMRC30258 TMRC30281 TMRC30283
## 0 0 3 427 2300 5685 548 18
## TMRC30284 TMRC30282 TMRC30050 TMRC30285 TMRC30020 TMRC30056 TMRC30052 TMRC30113
## 4 1 345 0 1150 2679 1416 3106
## TMRC30105 TMRC30058 TMRC30164 TMRC30080 TMRC30119 TMRC30059 TMRC30060 TMRC30066
## 17794 8032 424 2775 419 9107 13418 3141
## TMRC30117 TMRC30082 TMRC30103 TMRC30122 TMRC30169 TMRC30029 TMRC30025 TMRC30107
## 1186 4915 16417 6325 4501 2267 954 12245
## TMRC30170 TMRC30032 TMRC30096 TMRC30028 TMRC30115 TMRC30118 TMRC30180 TMRC30014
## 4781 22 16058 93 2048 747 2781 4
## TMRC30196 TMRC30030 TMRC30021 TMRC30023 TMRC30026 TMRC30037 TMRC30031 TMRC30165
## 687 3 88 120 5047 9 8 433
## TMRC30027 TMRC30194 TMRC30033 TMRC30038 TMRC30166 TMRC30036 TMRC30024 TMRC30195
## 108 198 36 208 496 11 49 458
## TMRC30048 TMRC30034 TMRC30039 TMRC30054 TMRC30045 TMRC30046 TMRC30040 TMRC30070
## 14152 21 1147 10956 334 16608 896 17528
## TMRC30049 TMRC30035 TMRC30055 TMRC30191 TMRC30053 TMRC30041 TMRC30068 TMRC30171
## 14002 153 16987 1532 17035 264 12531 370
## TMRC30192 TMRC30139 TMRC30042 TMRC30188 TMRC30158 TMRC30132 TMRC30160 TMRC30157
## 384 140 545 20820 284 1448 296 18641
## TMRC30189 TMRC30183 TMRC30167 TMRC30123 TMRC30181 TMRC30133 TMRC30043 TMRC30078
## 307 11681 1208 304 613 1599 571 4351
## TMRC30116 TMRC30184 TMRC30076 TMRC30159 TMRC30129 TMRC30088 TMRC30172 TMRC30134
## 7522 16796 2828 405 353 11379 383 15577
## TMRC30174 TMRC30137 TMRC30161 TMRC30142 TMRC30175 TMRC30190 TMRC30145 TMRC30143
## 644 389 300 4 1191 537 0 1
## TMRC30168 TMRC30197 TMRC30146 TMRC30182 TMRC30199 TMRC30198 TMRC30266 TMRC30268
## 14823 202 0 468 15264 158 822 3444
## TMRC30249 TMRC30246 TMRC30244 TMRC30201 TMRC30200 TMRC30203 TMRC30202 TMRC30205
## 14507 2826 1662 149 96 104 145 169
## TMRC30204 TMRC30152 TMRC30155 TMRC30154 TMRC30242 TMRC30237 TMRC30206 TMRC30136
## 1298 495 508 398 15023 2792 125 2170
## TMRC30207 TMRC30238 TMRC30074 TMRC30217 TMRC30208 TMRC30077 TMRC30219 TMRC30218
## 0 335 13616 0 1 11313 0 2
## TMRC30079 TMRC30260 TMRC30220 TMRC30262 TMRC30261 TMRC30135 TMRC30173 TMRC30264
## 6629 291 0 905 500 3906 898 309
## TMRC30263 TMRC30144 TMRC30147 TMRC30265
## 431 0 1 596
## subset_expt(): There were 244, now there are 32 samples.
## No samples have fewer than 3000 observed genes.
## The numbers of samples by condition are:
##
## Biopsy Eosinophils Macrophages Monocytes Neutrophils
## 8 1 17 1 5
## The samples excluded are: TMRC30061, TMRC30062, TMRC30063, TMRC30051, TMRC30064, TMRC30065, TMRC30162, TMRC30067, TMRC30057, TMRC30069, TMRC30286, TMRC30252, TMRC30250, TMRC30251, TMRC30245, TMRC30247, TMRC30248.
## subset_expt(): There were 32, now there are 15 samples.
visit_fact <- pData(lp_expt)[["visitnumber"]]
batch_na <- is.na(visit_fact)
visit_fact[batch_na] <- "undefined"
lp_expt <- set_expt_batches(lp_expt, fact = visit_fact)## The number of samples by batch are:
##
## 1 2 3
## 11 2 2
save(list = "lp_expt", file = glue("rda/lp_expt_all_sanitized-v{ver}.rda"))
data_structures <- c(data_structures, "lp_expt")The above block is similar in concept to the previous expressionset creation. It uses a different column and currently ignores the gene annotations. Given that many of the samples have essentially 0 reads, I set a cutoff of only 20 observed genes. Finally, I did a quick and dirty PCA plot of this peculiar data structure in the hopes of being able to ‘see’ the difference between what are assumed to be ‘real’ samples with a significant number of ‘real’ parasite reads vs. those samples which just have a couple of potentially spurious reads.
found_idx <- data_structures %in% ls()
if (sum(!found_idx) > 0) {
not_found <- data_structures[!found_idx]
warning("Some datastructures were not generated: ", toString(not_found), ".")
data_structures <- data_structures[found_idx]
}
save(list = data_structures, file = glue("rda/tmrc3_data_structures-v{ver}.rda"))pander::pander(sessionInfo())R version 4.4.1 (2024-06-14)
Platform: x86_64-conda-linux-gnu
locale: C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: BiocParallel(v.1.36.0), variancePartition(v.1.32.5), ruv(v.0.9.7.1), org.Lpanamensis.MHOMCOL81L13.v68.eg.db(v.2024.09), AnnotationDbi(v.1.64.1), tidyr(v.1.3.1), hpgltools(v.1.0), Matrix(v.1.6-5), SummarizedExperiment(v.1.32.0), GenomicRanges(v.1.54.1), GenomeInfoDb(v.1.38.6), IRanges(v.2.36.0), S4Vectors(v.0.40.2), MatrixGenerics(v.1.14.0), matrixStats(v.1.2.0), Biobase(v.2.62.0), BiocGenerics(v.0.48.1), glue(v.1.7.0), forcats(v.1.0.0) and dplyr(v.1.1.4)
loaded via a namespace (and not attached): splines(v.4.4.1), later(v.1.3.2), BiocIO(v.1.12.0), bitops(v.1.0-7), filelock(v.1.0.3), tibble(v.3.2.1), graph(v.1.80.0), XML(v.3.99-0.16.1), lifecycle(v.1.0.4), Rdpack(v.2.6), doParallel(v.1.0.17), fastcluster(v.1.2.6), edgeR(v.4.0.16), rprojroot(v.2.0.4), vroom(v.1.6.5), lattice(v.0.22-5), MASS(v.7.3-60.0.1), backports(v.1.4.1), magrittr(v.2.0.3), openxlsx(v.4.2.5.2), limma(v.3.58.1), plotly(v.4.10.4), sass(v.0.4.8), rmarkdown(v.2.25), jquerylib(v.0.1.4), yaml(v.2.3.8), httpuv(v.1.6.14), zip(v.2.3.1), minqa(v.1.2.6), cowplot(v.1.1.3), DBI(v.1.2.2), RColorBrewer(v.1.1-3), abind(v.1.4-5), pkgload(v.1.3.4), zlibbioc(v.1.48.0), EnvStats(v.2.8.1), Rtsne(v.0.17), purrr(v.1.0.2), RCurl(v.1.98-1.14), yulab.utils(v.0.1.7), rappdirs(v.0.3.3), sva(v.3.50.0), GenomeInfoDbData(v.1.2.11), pbkrtest(v.0.5.2), ggrepel(v.0.9.5), genefilter(v.1.84.0), testthat(v.3.2.1), annotate(v.1.80.0), codetools(v.0.2-19), DelayedArray(v.0.28.0), DOSE(v.3.28.2), xml2(v.1.3.6), tidyselect(v.1.2.0), farver(v.2.1.1), lme4(v.1.1-35.1), BiocFileCache(v.2.10.1), GenomicAlignments(v.1.38.2), jsonlite(v.1.8.8), ggsankey(v.0.0.99999), ellipsis(v.0.3.2), survival(v.3.5-8), iterators(v.1.0.14), foreach(v.1.5.2), tools(v.4.4.1), progress(v.1.2.3), Rcpp(v.1.0.12), gridExtra(v.2.3), SparseArray(v.1.2.4), mgcv(v.1.9-1), xfun(v.0.42), qvalue(v.2.34.0), numDeriv(v.2016.8-1.1), withr(v.3.0.0), fastmap(v.1.1.1), boot(v.1.3-29), fansi(v.1.0.6), caTools(v.1.18.2), digest(v.0.6.34), R6(v.2.5.1), mime(v.0.12), colorspace(v.2.1-0), GO.db(v.3.18.0), gtools(v.3.9.5), biomaRt(v.2.58.2), RSQLite(v.2.3.5), RhpcBLASctl(v.0.23-42), utf8(v.1.2.4), generics(v.0.1.3), data.table(v.1.15.0), corpcor(v.1.6.10), rtracklayer(v.1.62.0), prettyunits(v.1.2.0), httr(v.1.4.7), htmlwidgets(v.1.6.4), S4Arrays(v.1.2.0), pkgconfig(v.2.0.3), gtable(v.0.3.4), blob(v.1.2.4), XVector(v.0.42.0), remaCor(v.0.0.18), brio(v.1.1.4), htmltools(v.0.5.7), fgsea(v.1.28.0), GSEABase(v.1.64.0), scales(v.1.3.0), png(v.0.1-8), fANCOVA(v.0.6-1), knitr(v.1.45), tzdb(v.0.4.0), reshape2(v.1.4.4), rjson(v.0.2.21), nloptr(v.2.0.3), nlme(v.3.1-164), curl(v.5.2.0), cachem(v.1.0.8), stringr(v.1.5.1), KernSmooth(v.2.23-22), parallel(v.4.4.1), HDO.db(v.0.99.1), restfulr(v.0.0.15), desc(v.1.4.3), pillar(v.1.9.0), grid(v.4.4.1), vctrs(v.0.6.5), gplots(v.3.1.3.1), promises(v.1.2.1), dbplyr(v.2.4.0), xtable(v.1.8-4), waldo(v.0.5.2), evaluate(v.0.23), readr(v.2.1.5), GenomicFeatures(v.1.54.3), mvtnorm(v.1.2-4), cli(v.3.6.2), locfit(v.1.5-9.8), compiler(v.4.4.1), Rsamtools(v.2.18.0), rlang(v.1.1.3), crayon(v.1.5.2), labeling(v.0.4.3), plyr(v.1.8.9), fs(v.1.6.3), pander(v.0.6.5), stringi(v.1.8.3), viridisLite(v.0.4.2), lmerTest(v.3.1-3), munsell(v.0.5.0), Biostrings(v.2.70.2), lazyeval(v.0.2.2), aod(v.1.3.3), GOSemSim(v.2.28.1), hms(v.1.1.3), bit64(v.4.0.5), ggplot2(v.3.5.0), KEGGREST(v.1.42.0), statmod(v.1.5.0), varhandle(v.2.0.6), shiny(v.1.8.0), highr(v.0.10), rbibutils(v.2.2.16), broom(v.1.0.5), memoise(v.2.0.1), bslib(v.0.6.1), fastmatch(v.1.1-4) and bit(v.4.0.5)
message("This is hpgltools commit: ", get_git_commit())## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 08c5d4e0f4602762a9507c7b53068ac99f62c683
## This is hpgltools commit: Mon Nov 4 14:19:02 2024 -0500: 08c5d4e0f4602762a9507c7b53068ac99f62c683
message("Saving to ", savefile)## Saving to 01datasets.rda.xz
# tmp <- sm(saveme(filename = savefile))tmp <- loadme(filename = savefile)