Though there is no fundamental reason to separate the likelihood ratio testing and GSVA from the differential expression analyses, they are both tasks which reach back to the primary expression data, unlike GSEA. Therefore I chose to separate these two tasks into a their own document.
EdgeR (McCarthy, Chen, and Smyth (2012)) and DESeq2 (Love, Huber, and Anders (2014)) both provide easily accessible LRT analyses. If I remember correctly, the function I wrote to simplify these analyses is aware of both, but this document primarily uses the results from DESeq2. I am reasonably certain they end up basically identical…
The likelihood ratio testing performed on the TMRC3 data is intended to look for shared patterns of expression across some known, constant factor. This may be time (are there patterns across the visits of each person), or cell type (do some genes act consistently across type). We could also ask this question of our two clinics, or indeed across the cure/fail samples.
tc_clinical_filt <- normalize_expt(tc_clinical, filter = TRUE)## Removing 5654 low-count genes (14298 remaining).
tc_lrt_clinic <- deseq_lrt(tc_clinical_filt, transform = "vst", interaction = FALSE,
interactor_column = "visitnumber",
interest_column = "clinic")## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 169 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## A large number of genes was given-- please, make sure this is not an error. Normally, only DE genes will be useful for this function.
## Working with 6279 genes.
## Working with 6279 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
tc_lrt_clinic[["cluster_data"]][["plot"]]t_clinical_filt <- normalize_expt(t_clinical, filter=TRUE)## Removing 5796 low-count genes (14156 remaining).
lrt_visit <- deseq_lrt(t_clinical_filt, transform = "vst", interaction = FALSE,
interactor_column = "visitnumber",
interest_column = "finaloutcome")## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 120 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## A large number of genes was given-- please, make sure this is not an error. Normally, only DE genes will be useful for this function.
## Working with 5445 genes.
## Working with 5444 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
lrt_visit$cluster_data$plotsummary(lrt_visit[["favorite_genes"]])## genes cluster
## Length:5444 Min. : 1.00
## Class :character 1st Qu.: 2.00
## Mode :character Median : 3.00
## Mean : 3.65
## 3rd Qu.: 4.00
## Max. :11.00
written <- write_xlsx(data = as.data.frame(lrt_visit[["deseq_table"]]),
excel = glue("excel/lrt_clinical_visit-v{ver}.xlsx"))lrt_monocyte_visit <- deseq_lrt(t_visitcf_monocyte, transform = "vst",
interaction = FALSE,
interactor_column = "visitnumber",
interest_column = "finaloutcome")## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 68 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## Working with 12 genes.
## Working with 12 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
lrt_monocyte_visit[["cluster_data"]][["plot"]]lrt_neutrophil_visit <- deseq_lrt(t_visitcf_neutrophil, transform = "vst",
interaction = FALSE,
interactor_column = "visitnumber",
interest_column = "finaloutcome")## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 49 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## Working with 952 genes.
## Working with 952 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
lrt_neutrophil_visit[["cluster_data"]][["plot"]]lrt_eosinophil_visit <- deseq_lrt(t_visitcf_eosinophil, transform = "vst",
interaction = FALSE,
interactor_column = "visitnumber",
interest_column = "finaloutcome")## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## Warning in deseq_lrt(t_visitcf_eosinophil, transform = "vst", interaction =
## FALSE, : There are no significant differences given the 0.05 adjusted p-value.
## Returning the full LRT table just so that you have something to look at.
Suggestion from Neal: Get residual deviance of donor+visit, visit, donor.
Oh oh, we are actually comparing the results of these models: Define terms: exprs: expression matrix with rows=genes, column=samples This is synonymous with ‘outcome’. Conversely, finaloutcome is the term we wish to explain via the metadata and expression matrix.
I tested a few models and it seems that the LRT test accepts a model of ~visit+donor.
lrt_eo_vd <- deseq_lrt(t_eosinophils, transform = "vst",
interactor_column = "visitnumber",
interest_column = "donor", interaction = FALSE)## Warning in deseq_lrt(t_eosinophils, transform = "vst", interactor_column =
## "visitnumber", : The donor should probably be a factor, set it with the
## 'factors' arg.
## converting counts to integer mode
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## Working with 44 genes.
## Working with 39 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning: Failed to fit group 2.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 3.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 5.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 10.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 11.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 2.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 3.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 5.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 10.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 11.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 2.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 3.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 5.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 10.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 11.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 2.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 3.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 5.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 10.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning: Failed to fit group 11.
## Caused by error in `poly()`:
## ! 'degree' must be less than number of unique points
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
## Oh wait, I think that is the opposite of our goal, lets do the other way around.
## This creates a dataset from ~ donor + visitnumber
## and performs the LRT with ~ visitnumber
lrt_eo_dv <- deseq_lrt(t_eosinophils, transform = "vst",
interactor_column = "donor",
interest_column = "visitnumber", interaction = FALSE)## Warning in deseq_lrt(t_eosinophils, transform = "vst", interactor_column =
## "donor", : The donor should probably be a factor, set it with the 'factors'
## arg.
## converting counts to integer mode
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## A large number of genes was given-- please, make sure this is not an error. Normally, only DE genes will be useful for this function.
## Working with 6086 genes.
## Working with 6028 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.
Let us consider the rank of the eosinophil data. We are hypothesizing that each donor must have >= 2 samples for ~ donor + finaloutcome to work. Thus I performed table() on the experimental design and see that donor 2052 is unsuitable.
GSVA(Hänzelmann, Castelo, and Guinney (2013)) is a completely different family of analysis, and at least in my hands is used as a way to explore and look for publications which may be of interest. The general idea is that it performs a specific set of normalizations on the raw data, rank orders the results; then cross references them against extant analyses looking for over represented gene sets in specific papers or categories (e.g. reactome/GO/etc). This may either use a built-in set of mSigDB (Liberzon et al. (2011)) gene sets, or you may pull in manually downloaded (newer) data. On my workstation, I do the latter because I can leverage the downloaded gmt/xml/etc files in order to get the full annotations for the gene sets and accompanying papers. This container just uses the pre-downloaded mSigDB data (“GSVAdata” (n.d.)), which is a bit older.
Reminder: I disabled these for the moment in order to consider how to handle the (non)inclusion of the mSigDB gmt files.
If one chooses, simple_gsva() can either load signatures from the GSVAdata package, which is a little old, or load an arbitrary set. load_gmt_signatures() provides a quick way to extract them from a gmt file.
broad_c7 <- load_gmt_signatures(signatures = "reference/msigdb/c7.all.v7.5.1.entrez.gmt",
signature_category = "c7")
broad_c2 <- load_gmt_signatures(signatures = "reference/msigdb/c2.all.v7.5.1.entrez.gmt",
signature_category = "c2")
broad_h <- load_gmt_signatures(signatures = "reference/msigdb/h.all.v7.5.1.entrez.gmt",
signature_category = "h")Actually, I am going to try to semi-change my mind: GSVA comes with a dataset comprised of some older mSigDB data. Thus I will repeat each of the following blocks with an invocation which uses the older data and leave the previous invocations here so you can see what I actually ran to make pretty pictures. In the best case scenario, the results should be nearly identical with only a few categories missing between the new copy I downloaded and loaded above and the loaded-by-gsva version.
tc_celltype_gsva_c2 <- simple_gsva(
tc_valid, signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml")
tc_celltype_gsva_c2_sig <- get_sig_gsva_categories(
tc_celltype_gsva_c2,
excel = "analyses/3_cali_and_tumaco/GSVA/tc_valid_gsva_c2.xlsx")
tc_celltype_gsva_c2_sig$subset_plot
tc_celltype_gsva_c2_sig$score_plotThere are two other important differences between these results and the results obtained when using the above invocations and the following:
heh, only 1 way to find out! Let us try!
Reminder to self, these are the current mSigDB human categories, I am not certain which (I assume all) are included in GSVAdata.
tc_celltype_gsva_c2 <- simple_gsva(
tc_valid, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
tc_celltype_gsva_c2_sig <- get_sig_gsva_categories(
tc_celltype_gsva_c2,
excel = "analyses/3_cali_and_tumaco/GSVA/tc_valid_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 122 rows.
## The factor failure has 62 rows.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
tc_celltype_gsva_c2_sig$subset_plottc_celltype_gsva_c2_sig$score_plotDitto for C7
tc_celltype_gsva_c7 <- simple_gsva(
tc_valid, signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category="c7")
tc_celltype_gsva_c7_sig <- get_sig_gsva_categories(
tc_celltype_gsva_c7,
excel = "analyses/3_cali_and_tumaco/GSVA/tc_valid_gsva_c7.xlsx")tc_celltype_gsva_c7 <- simple_gsva(
tc_valid, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
tc_celltype_gsva_c7_sig <- get_sig_gsva_categories(
tc_celltype_gsva_c7,
excel = "analyses/3_cali_and_tumaco/GSVA/tc_valid_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 122 rows.
## The factor failure has 62 rows.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
tc_celltype_gsva_c7_sig$subset_plottc_celltype_gsva_c7_sig$score_plotI am not too fussed about the hallmark genes.
tc_celltype_gsva_h <- simple_gsva(
tc_valid,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "h")
tc_celltype_gsva_h_sig <- get_sig_gsva_categories(
tc_celltype_gsva_h,
excel = "analyses/3_cali_and_tumaco/GSVA/tc_valid_gsva_h.xlsx")The C2 set includes a broad array of databases and studies. Thus when we look at the various gene sets which are deemed to be poor scoring, we see things like ‘the retinoid cycle in cones’ or ‘Nicotine metabolism’.
t_clinical_gsva_c2 <- simple_gsva(
t_clinical,
signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c2")
t_clinical_gsva_c2_sig <- get_sig_gsva_categories(
t_clinical_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_c2.xlsx")
t_clinical_gsva_c2_sig$subset_plot
t_clinical_gsva_c2_sig$score_plott_clinical_gsva_c2 <- simple_gsva(
t_clinical, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_clinical_gsva_c2_sig <- get_sig_gsva_categories(
t_clinical_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 67 rows.
## The factor failure has 56 rows.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
t_clinical_gsva_c2_sig$subset_plott_clinical_gsva_c2_sig$score_plotThe get_sig_gsva_categories() function uses a few metrics to attempt to find msigDB categories of interest from the GSVA results. One of the methods employed which I hope will prove useful is to test the scores observed for one condition against all values in order to look for categories which are the most likely to be interesting.
With that in mind, here are the brief descriptions for a few of the most highly scored C2 groups and their associated PMIDs. Note that this are taken from the set of failed samples vs. all; there are some differences among the cure samples, but they are generally very similar.
In contrast, the C7 set is relatively focused on the immune system, and is therefore likely to be of direct interest.
t_clinical_gsva_c7 <- simple_gsva(
t_clinical,
signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c7")
t_clinical_gsva_c7_sig <- get_sig_gsva_categories(
t_clinical_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_c7.xlsx")t_clinical_gsva_c7 <- simple_gsva(
t_clinical, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_clinical_gsva_c7_sig <- get_sig_gsva_categories(
t_clinical_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 67 rows.
## The factor failure has 56 rows.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
With the above in mind, here are a few of the most highly scored papers/sets when scoring the failed samples:
The H set is both much larger and smaller, it is comprised of just 50 sets, but they have many more genes associated with them.
t_clinical_gsva_h <- simple_gsva(
t_clinical,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category="h")
t_clinical_gsva_h_sig <- get_sig_gsva_categories(
t_clinical_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_h.xlsx")t_clinical_gsva_h <- simple_gsva(
t_clinical, signature_category = "h")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_clinical_gsva_h_sig <- get_sig_gsva_categories(
t_clinical_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_clinical_gsva_h.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 67 rows.
## The factor failure has 56 rows.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
t_biopsy_gsva_c2 <- simple_gsva(
t_biopsies,
signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category="c2")
t_biopsy_gsva_c2_sig <- get_sig_gsva_categories(
t_biopsy_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_c2.xlsx")t_biopsy_gsva_c2 <- simple_gsva(
t_biopsies, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_biopsy_gsva_c2_sig <- get_sig_gsva_categories(
t_biopsy_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 9 rows.
## The factor tumaco_failure has 5 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_biopsy_gsva_c7 <- simple_gsva(
t_biopsies,
signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c7")
t_biopsy_gsva_c7_sig <- get_sig_gsva_categories(
t_biopsy_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_c7.xlsx")t_biopsy_gsva_c7 <- simple_gsva(
t_biopsies, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_biopsy_gsva_c7_sig <- get_sig_gsva_categories(
t_biopsy_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 9 rows.
## The factor tumaco_failure has 5 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_biopsy_gsva_h <- simple_gsva(
t_biopsies,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "h")
t_biopsy_gsva_h_sig <- get_sig_gsva_categories(
t_biopsy_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_h.xlsx")t_biopsy_gsva_h <- simple_gsva(
t_biopsies, signature_category = "h")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_biopsy_gsva_h_sig <- get_sig_gsva_categories(
t_biopsy_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_biopsy_gsva_h.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 9 rows.
## The factor tumaco_failure has 5 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_eosinophil_gsva_c7 <- simple_gsva(
t_eosinophils,
signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c7")
t_eosinophil_gsva_c7_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_c7.xlsx")t_eosinophil_gsva_c7 <- simple_gsva(
t_eosinophils, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_eosinophil_gsva_c7_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 17 rows.
## The factor tumaco_failure has 9 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_eosinophil_gsva_c2 <- simple_gsva(
t_eosinophils,
signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c2")
t_eosinophil_gsva_c2_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_c2.xlsx")t_eosinophil_gsva_c2 <- simple_gsva(
t_eosinophils, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_eosinophil_gsva_c2_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 17 rows.
## The factor tumaco_failure has 9 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_eosinophil_gsva_h <- simple_gsva(
t_eosinophils,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "h")
t_eosinophil_gsva_h_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_h.xlsx")t_eosinophil_gsva_h <- simple_gsva(
t_eosinophils, signature_category = "h")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_eosinophil_gsva_h_sig <- get_sig_gsva_categories(
t_eosinophil_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_eosinophil_gsva_h.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 17 rows.
## The factor tumaco_failure has 9 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_monocyte_gsva_c7 <- simple_gsva(
t_monocytes,
signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c7")
t_monocyte_gsva_c7_sig <- get_sig_gsva_categories(
t_monocyte_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_c7.xlsx")t_monocyte_gsva_c7 <- simple_gsva(
t_monocytes, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_monocyte_gsva_c7_sig <- get_sig_gsva_categories(
t_monocyte_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 21 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_monocyte_gsva_c2 <- simple_gsva(
t_monocytes,
signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c2")
t_monocyte_gsva_c2_sig <- get_sig_gsva_categories(
t_monocyte_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_c2.xlsx")t_monocyte_gsva_c2 <- simple_gsva(
t_monocytes, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_monocyte_gsva_c2_sig <- get_sig_gsva_categories(
t_monocyte_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 21 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_monocyte_gsva_h <- simple_gsva(
t_monocytes,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "h")
t_monocyte_gsva_h_sig <- get_sig_gsva_categories(
t_monocyte_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_h.xlsx")t_monocyte_gsva_h <- simple_gsva(
t_monocytes, signature_category = "h")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_monocyte_gsva_h_sig <- get_sig_gsva_categories(
t_monocyte_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_monocyte_gsva_h.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 21 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_neutrophil_gsva_c7 <- simple_gsva(
t_neutrophils,
signatures = broad_c7,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c7")
t_neutrophil_gsva_c7_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_c7.xlsx")t_neutrophil_gsva_c7 <- simple_gsva(
t_neutrophils, signature_category = "c7")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_neutrophil_gsva_c7_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_c7,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_c7.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 20 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_neutrophil_gsva_c2 <- simple_gsva(
t_neutrophils,
signatures = broad_c2,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "c2")
t_neutrophil_gsva_c2_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_c2.xlsx")t_neutrophil_gsva_c2 <- simple_gsva(
t_neutrophils, signature_category = "c2")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_neutrophil_gsva_c2_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_c2,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_c2.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 20 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.
t_neutrophil_gsva_h <- simple_gsva(
t_neutrophils,
signatures = broad_h,
msig_xml = "reference/msigdb/msigdb_v7.5.1.xml",
signature_category = "h")
t_neutrophil_gsva_h_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_h.xlsx")t_neutrophil_gsva_h <- simple_gsva(
t_neutrophils, signature_category = "h")## Converting the rownames() of the expressionset to ENTREZID.
## 574 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19952 entries.
## After conversion, the expressionset has 19337 entries.
## Adding descriptions and IDs to the gene set annotations.
t_neutrophil_gsva_h_sig <- get_sig_gsva_categories(
t_neutrophil_gsva_h,
excel = "analyses/4_tumaco/GSVA/t_neutrophil_gsva_h.xlsx")## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: tumacofailure_vs_tumacocure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: tumacocure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: tumacofailure. Adjust = BH
## The factor tumaco_cure has 20 rows.
## The factor tumaco_failure has 21 rows.
## Testing each factor against the others.
## Scoring tumaco_cure against everything else.
## Scoring tumaco_failure against everything else.