Last updated: 2023-11-07
Checks: 7 0
Knit directory: muse/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200712)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 4434f01. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: r_packages_4.3.2/
Untracked files:
Untracked: analysis/cell_ranger.Rmd
Untracked: analysis/sleuth.Rmd
Untracked: analysis/tss_xgboost.Rmd
Untracked: code/multiz100way/
Untracked: data/HG00702_SH089_CHSTrio.chr1.vcf.gz
Untracked: data/HG00702_SH089_CHSTrio.chr1.vcf.gz.tbi
Untracked: data/ncrna_NONCODE[v3.0].fasta.tar.gz
Untracked: data/ncrna_noncode_v3.fa
Untracked: data/netmhciipan.out.gz
Untracked: export/davetang039sblog.WordPress.2023-06-30.xml
Untracked: export/output/
Untracked: women.json
Unstaged changes:
Modified: analysis/graph.Rmd
Modified: analysis/gsva.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/gdc.Rmd
) and HTML
(docs/gdc.html
) files. If you’ve configured a remote Git
repository (see ?wflow_git_remote
), click on the hyperlinks
in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 4434f01 | Dave Tang | 2023-11-07 | Filter for open access |
html | fb01562 | Dave Tang | 2023-11-06 | Build site. |
Rmd | c759821 | Dave Tang | 2023-11-06 | Additional clinical data |
html | 59cfc19 | Dave Tang | 2023-11-06 | Build site. |
Rmd | a9ee937 | Dave Tang | 2023-11-06 | Additional cancers |
html | 87ee57e | Dave Tang | 2023-11-06 | Build site. |
Rmd | f22f94c | Dave Tang | 2023-11-06 | Link to cases |
html | 131a349 | Dave Tang | 2023-11-01 | Build site. |
Rmd | 705fefa | Dave Tang | 2023-11-01 | Treatments |
html | 2f8ef49 | Dave Tang | 2023-11-01 | Build site. |
Rmd | 75030f1 | Dave Tang | 2023-11-01 | Treatment type |
html | 8fca622 | Dave Tang | 2023-11-01 | Build site. |
Rmd | 3fc037e | Dave Tang | 2023-11-01 | Using the GenomicDataCommons package |
The National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is a data sharing platform that promotes precision medicine in oncology. It is not just a database or a tool; it is an expandable knowledge network supporting the import and standardisation of genomic and clinical data from cancer research programs. The GDC contains NCI-generated data from some of the largest and most comprehensive cancer genomic datasets, including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Therapies (TARGET). For the first time, these datasets have been harmonised using a common set of bioinformatics pipelines, so that the data can be directly compared. As a growing knowledge system for cancer, the GDC also enables researchers to submit data, and harmonises these data for import into the GDC. As more researchers add clinical and genomic data to the GDC, it will become an even more powerful tool for making discoveries about the molecular basis of cancer that may lead to better care for patients.
The GenomicDataCommons Bioconductor package provides basic infrastructure for querying, accessing, and mining genomic datasets available from the GDC.
See The GDC API page.
Install the GenomicDataCommons
package using BiocManager
.
if (! "GenomicDataCommons" %in% installed.packages()[, 1]){
BiocManager::install("GenomicDataCommons")
}
library(GenomicDataCommons)
packageVersion("GenomicDataCommons")
[1] '1.26.0'
Check status to see if we can query the GDC.
GenomicDataCommons::status()
$commit
[1] "023da73eee3c17608db1a9903c82852428327b88"
$data_release
[1] "Data Release 38.0 - August 31, 2023"
$status
[1] "OK"
$tag
[1] "5.0.6"
$version
[1] 1
stopifnot(GenomicDataCommons::status()$status=="OK")
The following code builds a manifest
that can be used to
guide the download of raw data. Here, filtering finds open gene
expression files quantified as raw counts using STAR from TCGA ovarian
cancer patients.
ge_manifest <- files() %>%
filter(cases.project.project_id == 'TCGA-OV') %>%
filter(type == 'gene_expression' ) %>%
filter(access == 'open') %>%
filter(analysis.workflow_type == 'STAR - Counts') %>%
manifest()
DT::datatable(ge_manifest)
The gdcdata
function is used to download GDC files.
fnames <- lapply(ge_manifest$id[1:3], gdcdata)
fnames
[[1]]
96aca0af-a776-460d-95ff-87e364e4ac99
"~/.cache/GenomicDataCommons/96aca0af-a776-460d-95ff-87e364e4ac99/21ff9928-00f0-4b96-8d70-35e9bfad5d40.rna_seq.augmented_star_gene_counts.tsv"
[[2]]
b668c86b-fa56-4d39-9529-5b47081a3faa
"~/.cache/GenomicDataCommons/b668c86b-fa56-4d39-9529-5b47081a3faa/41bdbd88-b4b2-4884-8a44-b34656ae4156.rna_seq.augmented_star_gene_counts.tsv"
[[3]]
60678f17-e3d7-40cd-99ff-73706497968a
"~/.cache/GenomicDataCommons/60678f17-e3d7-40cd-99ff-73706497968a/03c8e4fe-1e07-4ea3-a154-c17c2e8af508.rna_seq.augmented_star_gene_counts.tsv"
Files are downloaded and stored in the directory specified by
gdc_cache()
.
gdc_cache()
[1] "~/.cache/GenomicDataCommons"
Tally the total number of available STAR gene counts that are open for download.
open_star_manifest <- files() %>%
filter(analysis.workflow_type == 'STAR - Counts') %>%
filter(access == 'open') %>%
manifest()
dim(open_star_manifest)
[1] 23111 16
Queries in the GenomicDataCommons
package follow the
four metadata endpoints available at the GDC; there are four convenience
functions that each create GDCQuery
objects:
projects()
cases()
files()
annotations()
Four endpoints: projects, cases, files, and annotations that have various associated fields. These are the default fields.
endpoints <- c("projects", "cases", "files", "annotations")
sapply(endpoints, default_fields)
$projects
[1] "dbgap_accession_number" "disease_type" "intended_release_date"
[4] "name" "primary_site" "project_autocomplete"
[7] "project_id" "releasable" "released"
[10] "state"
$cases
[1] "aliquot_ids" "analyte_ids"
[3] "case_autocomplete" "case_id"
[5] "consent_type" "created_datetime"
[7] "days_to_consent" "days_to_lost_to_followup"
[9] "diagnosis_ids" "disease_type"
[11] "index_date" "lost_to_followup"
[13] "portion_ids" "primary_site"
[15] "sample_ids" "slide_ids"
[17] "state" "submitter_aliquot_ids"
[19] "submitter_analyte_ids" "submitter_diagnosis_ids"
[21] "submitter_id" "submitter_portion_ids"
[23] "submitter_sample_ids" "submitter_slide_ids"
[25] "updated_datetime"
$files
[1] "access" "acl"
[3] "average_base_quality" "average_insert_size"
[5] "average_read_length" "channel"
[7] "chip_id" "chip_position"
[9] "contamination" "contamination_error"
[11] "created_datetime" "data_category"
[13] "data_format" "data_type"
[15] "error_type" "experimental_strategy"
[17] "file_autocomplete" "file_id"
[19] "file_name" "file_size"
[21] "imaging_date" "magnification"
[23] "md5sum" "mean_coverage"
[25] "msi_score" "msi_status"
[27] "pairs_on_diff_chr" "plate_name"
[29] "plate_well" "platform"
[31] "proc_internal" "proportion_base_mismatch"
[33] "proportion_coverage_10x" "proportion_coverage_10X"
[35] "proportion_coverage_30x" "proportion_coverage_30X"
[37] "proportion_reads_duplicated" "proportion_reads_mapped"
[39] "proportion_targets_no_coverage" "read_pair_number"
[41] "revision" "stain_type"
[43] "state" "state_comment"
[45] "submitter_id" "tags"
[47] "total_reads" "tumor_ploidy"
[49] "tumor_purity" "type"
[51] "updated_datetime" "wgs_coverage"
$annotations
[1] "annotation_autocomplete" "annotation_id"
[3] "case_id" "case_submitter_id"
[5] "category" "classification"
[7] "created_datetime" "entity_id"
[9] "entity_submitter_id" "entity_type"
[11] "legacy_created_datetime" "legacy_updated_datetime"
[13] "notes" "state"
[15] "status" "submitter_id"
[17] "updated_datetime"
Available fields for each endpoint.
all_fields <- sapply(endpoints, available_fields)
names(all_fields) <- endpoints
sapply(all_fields, length)
projects cases files annotations
22 1001 1022 30
These fields can be used for filtering purposes.
head(all_fields$files)
[1] "access" "acl"
[3] "analysis.analysis_id" "analysis.analysis_type"
[5] "analysis.created_datetime" "analysis.input_files.access"
Use the facet
function to aggregate on values used for a
particular field.
files() %>% facet("access") %>% aggregations()
$access
doc_count key
1 678416 controlled
2 325331 open
Use grep
to search for fields of interest, for example
“project”.
grep("project", all_fields$files, ignore.case = TRUE, value = TRUE)
[1] "cases.project.dbgap_accession_number"
[2] "cases.project.disease_type"
[3] "cases.project.intended_release_date"
[4] "cases.project.name"
[5] "cases.project.primary_site"
[6] "cases.project.program.dbgap_accession_number"
[7] "cases.project.program.name"
[8] "cases.project.program.program_id"
[9] "cases.project.project_id"
[10] "cases.project.releasable"
[11] "cases.project.released"
[12] "cases.project.state"
[13] "cases.tissue_source_site.project"
Look for “days_to_collection”.
grep("collection", all_fields$cases, ignore.case = TRUE, value = TRUE)
[1] "samples.days_to_collection" "samples.tissue_collection_type"
Look for “workflow_type”.
grep("workflow_type", all_fields$cases, ignore.case = TRUE, value = TRUE)
[1] "files.analysis.metadata.read_groups.read_group_qcs.workflow_type"
[2] "files.analysis.workflow_type"
[3] "files.downstream_analyses.workflow_type"
Look for “treatment”.
grep("treatment", all_fields$cases, ignore.case = TRUE, value = TRUE)
[1] "diagnoses.prior_treatment"
[2] "diagnoses.treatments.chemo_concurrent_to_radiation"
[3] "diagnoses.treatments.created_datetime"
[4] "diagnoses.treatments.days_to_treatment_end"
[5] "diagnoses.treatments.days_to_treatment_start"
[6] "diagnoses.treatments.initial_disease_status"
[7] "diagnoses.treatments.number_of_cycles"
[8] "diagnoses.treatments.reason_treatment_ended"
[9] "diagnoses.treatments.regimen_or_line_of_therapy"
[10] "diagnoses.treatments.route_of_administration"
[11] "diagnoses.treatments.state"
[12] "diagnoses.treatments.submitter_id"
[13] "diagnoses.treatments.therapeutic_agents"
[14] "diagnoses.treatments.treatment_anatomic_site"
[15] "diagnoses.treatments.treatment_arm"
[16] "diagnoses.treatments.treatment_dose"
[17] "diagnoses.treatments.treatment_dose_units"
[18] "diagnoses.treatments.treatment_effect"
[19] "diagnoses.treatments.treatment_effect_indicator"
[20] "diagnoses.treatments.treatment_frequency"
[21] "diagnoses.treatments.treatment_id"
[22] "diagnoses.treatments.treatment_intent_type"
[23] "diagnoses.treatments.treatment_or_therapy"
[24] "diagnoses.treatments.treatment_outcome"
[25] "diagnoses.treatments.treatment_type"
[26] "diagnoses.treatments.updated_datetime"
[27] "follow_ups.diabetes_treatment_type"
[28] "follow_ups.haart_treatment_indicator"
[29] "follow_ups.immunosuppressive_treatment_type"
[30] "follow_ups.reflux_treatment_type"
[31] "follow_ups.risk_factor_treatment"
Note that each entry above is separated by a period (.
);
this indicates the hierarchical structure. Summarise the top level
fields by using sub
.
unique(sub("^(\\w+)\\..*", "\\1", all_fields$cases))
[1] "aliquot_ids" "analyte_ids"
[3] "annotations" "case_autocomplete"
[5] "case_id" "consent_type"
[7] "created_datetime" "days_to_consent"
[9] "days_to_lost_to_followup" "demographic"
[11] "diagnoses" "diagnosis_ids"
[13] "disease_type" "exposures"
[15] "family_histories" "files"
[17] "follow_ups" "index_date"
[19] "lost_to_followup" "portion_ids"
[21] "primary_site" "project"
[23] "sample_ids" "samples"
[25] "slide_ids" "state"
[27] "submitter_aliquot_ids" "submitter_analyte_ids"
[29] "submitter_diagnosis_ids" "submitter_id"
[31] "submitter_portion_ids" "submitter_sample_ids"
[33] "submitter_slide_ids" "summary"
[35] "tissue_source_site" "updated_datetime"
All aggregations are only on one field at a time.
files() %>% facet(c("type", "data_format")) %>% aggregations()
$data_format
doc_count key
1 188265 tsv
2 184432 vcf
3 163225 maf
4 149745 bam
5 123119 txt
6 52733 bedpe
7 32898 svs
8 32708 idat
9 24236 cel
10 24002 bcr xml
11 11324 pdf
12 10755 bcr ssf xml
13 2884 bcr auxiliary xml
14 1051 bcr omf xml
15 805 cdc json
16 602 bcr biotab
17 568 bcr pps xml
18 215 jpeg 2000
19 74 mex
20 70 xlsx
21 36 hdf5
$type
doc_count key
1 197177 annotated_somatic_mutation
2 149745 aligned_reads
3 98319 structural_variation
4 94773 simple_somatic_mutation
5 71861 copy_number_segment
6 69806 copy_number_estimate
7 46580 gene_expression
8 34661 aggregated_somatic_mutation
9 34408 mirna_expression
10 33113 slide_image
11 32708 masked_methylation_array
12 26978 biospecimen_supplement
13 24236 submitted_genotyping_array
14 23135 simple_germline_variation
15 16657 masked_somatic_mutation
16 16354 methylation_beta_value
17 13898 clinical_supplement
18 11324 pathology_report
19 7906 protein_expression
20 108 secondary_expression_analysis
Aggregate on a sub-field.
cases() %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 12170 radiation therapy, nos
2 11994 pharmaceutical therapy, nos
3 465 chemotherapy
4 520 stem cell transplantation, autologous
5 296 surgery, nos
6 171 targeted molecular therapy
7 168 immunotherapy (including vaccines)
8 96 radiation, external beam
9 53 brachytherapy, low dose
10 38 hormone therapy
11 33 brachytherapy, high dose
12 14 stem cell transplantation, allogeneic
13 9 radiation, 2d conventional
14 7 radiation, 3d conformal
15 6 radiation, intensity-modulated radiotherapy
16 5 radiation, stereotactic/gamma knife/srs
17 3 stereotactic radiosurgery
18 1 ablation, radiofrequency
19 1 external beam radiation
20 1 peptide receptor radionuclide therapy (prrt)
21 1 radiation, proton beam
22 30737 _missing
Facet on open analysis.workflow_type
.
files() %>%
filter(access == 'open') %>%
facet("analysis.workflow_type") %>%
aggregations()
$analysis.workflow_type
doc_count key
1 49062 SeSAMe Methylation Beta Estimation
2 45258 DNAcopy
3 34408 BCGSC miRNA Profiling
4 23164 ASCAT2
5 23111 STAR - Counts
6 21264 ASCAT3
7 16522 Aliquot Ensemble Somatic Variant Merging and Masking
8 10677 ABSOLUTE LiftOver
9 8776 AscatNGS
10 108 Seurat - 10x Chromium
11 38 CellRanger - 10x Raw Counts
12 36 CellRanger - 10x Filtered Counts
13 92907 _missing
Facet on open experimental_strategy
.
files() %>%
filter(access == 'open') %>%
facet("experimental_strategy") %>%
aggregations()
$experimental_strategy
doc_count key
1 100363 Genotyping Array
2 49062 Methylation Array
3 34408 miRNA-Seq
4 23111 RNA-Seq
5 21348 Tissue Slide
6 16075 WXS
7 11765 Diagnostic Slide
8 8776 WGS
9 7906 Reverse Phase Protein Array
10 447 Targeted Sequencing
11 182 scRNA-Seq
12 51888 _missing
All BAM files are under controlled access.
files() %>%
filter(data_format == 'bam') %>%
facet("access") %>%
aggregations()
$access
doc_count key
1 149745 controlled
All VCF files are also under controlled access.
files() %>%
filter(data_format == 'vcf') %>%
facet("access") %>%
aggregations()
$access
doc_count key
1 184432 controlled
Mutation Annotation Format (MAF) are openly available. These files are tab-delimited text files with aggregated mutation information from VCF files.
files() %>%
filter(access == 'open') %>%
filter(experimental_strategy == 'WXS') %>%
facet("data_format") %>%
aggregations()
$data_format
doc_count key
1 16075 maf
Project fields.
all_fields$projects
[1] "dbgap_accession_number"
[2] "disease_type"
[3] "intended_release_date"
[4] "name"
[5] "primary_site"
[6] "program.dbgap_accession_number"
[7] "program.name"
[8] "program.program_id"
[9] "project_autocomplete"
[10] "project_id"
[11] "releasable"
[12] "released"
[13] "state"
[14] "summary.case_count"
[15] "summary.data_categories.case_count"
[16] "summary.data_categories.data_category"
[17] "summary.data_categories.file_count"
[18] "summary.experimental_strategies.case_count"
[19] "summary.experimental_strategies.experimental_strategy"
[20] "summary.experimental_strategies.file_count"
[21] "summary.file_count"
[22] "summary.file_size"
Use projects
to fetch project information and
ids
to list all available projects.
projects() %>% results_all() -> project_info
sort(ids(project_info))
[1] "APOLLO-LUAD" "BEATAML1.0-COHORT"
[3] "BEATAML1.0-CRENOLANIB" "CDDP_EAGLE-1"
[5] "CGCI-BLGSP" "CGCI-HTMCP-CC"
[7] "CGCI-HTMCP-DLBCL" "CGCI-HTMCP-LC"
[9] "CMI-ASC" "CMI-MBC"
[11] "CMI-MPC" "CPTAC-2"
[13] "CPTAC-3" "CTSP-DLBCL1"
[15] "EXCEPTIONAL_RESPONDERS-ER" "FM-AD"
[17] "GENIE-DFCI" "GENIE-GRCC"
[19] "GENIE-JHU" "GENIE-MDA"
[21] "GENIE-MSK" "GENIE-NKI"
[23] "GENIE-UHN" "GENIE-VICC"
[25] "HCMI-CMDC" "MATCH-B"
[27] "MATCH-N" "MATCH-Q"
[29] "MATCH-Y" "MATCH-Z1D"
[31] "MMRF-COMMPASS" "MP2PRT-ALL"
[33] "MP2PRT-WT" "NCICCR-DLBCL"
[35] "OHSU-CNL" "ORGANOID-PANCREATIC"
[37] "REBC-THYR" "TARGET-ALL-P1"
[39] "TARGET-ALL-P2" "TARGET-ALL-P3"
[41] "TARGET-AML" "TARGET-CCSK"
[43] "TARGET-NBL" "TARGET-OS"
[45] "TARGET-RT" "TARGET-WT"
[47] "TCGA-ACC" "TCGA-BLCA"
[49] "TCGA-BRCA" "TCGA-CESC"
[51] "TCGA-CHOL" "TCGA-COAD"
[53] "TCGA-DLBC" "TCGA-ESCA"
[55] "TCGA-GBM" "TCGA-HNSC"
[57] "TCGA-KICH" "TCGA-KIRC"
[59] "TCGA-KIRP" "TCGA-LAML"
[61] "TCGA-LGG" "TCGA-LIHC"
[63] "TCGA-LUAD" "TCGA-LUSC"
[65] "TCGA-MESO" "TCGA-OV"
[67] "TCGA-PAAD" "TCGA-PCPG"
[69] "TCGA-PRAD" "TCGA-READ"
[71] "TCGA-SARC" "TCGA-SKCM"
[73] "TCGA-STAD" "TCGA-TGCT"
[75] "TCGA-THCA" "TCGA-THYM"
[77] "TCGA-UCEC" "TCGA-UCS"
[79] "TCGA-UVM" "TRIO-CRU"
[81] "VAREPOP-APOLLO" "WCDT-MCRPC"
The results()
method will fetch actual results.
projects() %>% results(size = 10) -> my_proj
str(my_proj, max.level = 1)
List of 9
$ id : chr [1:10] "CGCI-HTMCP-CC" "TARGET-AML" "GENIE-JHU" "GENIE-MSK" ...
$ primary_site :List of 10
$ dbgap_accession_number: chr [1:10] "phs000528" "phs000465" NA NA ...
$ project_id : chr [1:10] "CGCI-HTMCP-CC" "TARGET-AML" "GENIE-JHU" "GENIE-MSK" ...
$ disease_type :List of 10
$ name : chr [1:10] "HIV+ Tumor Molecular Characterization Project - Cervical Cancer" "Acute Myeloid Leukemia" "AACR Project GENIE - Contributed by Johns Hopkins Sidney Kimmel Comprehensive Cancer Center" "AACR Project GENIE - Contributed by Memorial Sloan Kettering Cancer Center" ...
$ releasable : logi [1:10] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ state : chr [1:10] "open" "open" "open" "open" ...
$ released : logi [1:10] TRUE TRUE TRUE TRUE TRUE TRUE ...
- attr(*, "row.names")= int [1:10] 1 2 3 4 5 6 7 8 9 10
- attr(*, "class")= chr [1:3] "GDCprojectsResults" "GDCResults" "list"
my_proj$project_id
[1] "CGCI-HTMCP-CC" "TARGET-AML" "GENIE-JHU" "GENIE-MSK"
[5] "GENIE-VICC" "GENIE-MDA" "TCGA-MESO" "TARGET-ALL-P3"
[9] "TCGA-UVM" "TCGA-KICH"
The gdc_clinical
function:
The NCI GDC has a complex data model that allows various studies to supply numerous clinical and demographic data elements. However, across all projects that enter the GDC, there are similarities. This function returns four data.frames associated with case_ids from the GDC.
Accessing clinical data.
case_ids <- cases() %>% results(size=10) %>% ids()
clindat <- gdc_clinical(case_ids)
names(clindat)
[1] "demographic" "diagnoses" "exposures" "main"
Demographic.
idx <- apply(clindat$demographic, 2, function(x) all(is.na(x)))
DT::datatable(clindat$demographic[, !idx])
Diagnoses data.
idx <- apply(clindat$diagnoses, 2, function(x) all(is.na(x)))
DT::datatable(clindat$diagnoses[, !idx])
Exposures data.
idx <- apply(clindat$exposures, 2, function(x) all(is.na(x)))
DT::datatable(clindat$exposures[, !idx])
Main data.
idx <- apply(clindat$main, 2, function(x) all(is.na(x)))
DT::datatable(clindat$main[, !idx])
Find all files related to a specific case, or sample donor.
case1 <- cases() %>% results(size=1)
str(case1, max.level = 1)
List of 25
$ id : chr "935ca1d3-2445-4f59-95a6-19f3311c1900"
$ lost_to_followup : chr "No"
$ slide_ids :List of 1
$ submitter_slide_ids :List of 1
$ days_to_lost_to_followup: logi NA
$ disease_type : chr "Squamous Cell Neoplasms"
$ analyte_ids :List of 1
$ submitter_id : chr "HTMCP-03-06-02345"
$ submitter_analyte_ids :List of 1
$ days_to_consent : logi NA
$ aliquot_ids :List of 1
$ submitter_aliquot_ids :List of 1
$ created_datetime : chr "2019-11-21T18:06:42.617487-06:00"
$ diagnosis_ids :List of 1
$ sample_ids :List of 1
$ consent_type : logi NA
$ submitter_sample_ids :List of 1
$ primary_site : chr "Cervix uteri"
$ submitter_diagnosis_ids :List of 1
$ updated_datetime : chr "2020-04-28T11:49:05.699379-05:00"
$ case_id : chr "935ca1d3-2445-4f59-95a6-19f3311c1900"
$ index_date : chr "Diagnosis"
$ state : chr "released"
$ portion_ids :List of 1
$ submitter_portion_ids :List of 1
- attr(*, "row.names")= int 1
- attr(*, "class")= chr [1:3] "GDCcasesResults" "GDCResults" "list"
Sample IDs.
case1$sample_ids
$`935ca1d3-2445-4f59-95a6-19f3311c1900`
[1] "f7706af8-c4e6-4e94-95f1-b6b4901dfe28"
[2] "bb3365f7-7bf9-46c6-ac60-4b7e77268ed8"
[3] "a35a4c87-86f9-4400-b43a-2b0999c69c19"
All case fields.
case_fields <- available_fields("cases")
Grep case_fields
.
grep("sample_ids", case_fields, value = TRUE)
[1] "sample_ids" "submitter_sample_ids"
grep("sample_type", case_fields, value = TRUE)
[1] "samples.sample_type" "samples.sample_type_id"
grep("workflow_type", case_fields, value = TRUE)
[1] "files.analysis.metadata.read_groups.read_group_qcs.workflow_type"
[2] "files.analysis.workflow_type"
[3] "files.downstream_analyses.workflow_type"
Get case data.
n_star_cases <- cases() %>%
filter(files.analysis.workflow_type == 'STAR - Counts') %>%
filter(files.access == 'open') %>%
count()
star_cases <- cases() %>%
filter(files.analysis.workflow_type == 'STAR - Counts') %>%
filter(files.access == 'open') %>%
results(size = n_star_cases)
sapply(star_cases, length)
id lost_to_followup slide_ids
19101 19101 19101
submitter_slide_ids days_to_lost_to_followup disease_type
19101 19101 19101
analyte_ids submitter_id submitter_analyte_ids
19101 19101 19101
days_to_consent aliquot_ids submitter_aliquot_ids
19101 19101 19101
created_datetime diagnosis_ids sample_ids
19101 19101 19101
consent_type submitter_sample_ids primary_site
19101 19101 19101
submitter_diagnosis_ids updated_datetime case_id
19101 19101 19101
index_date state portion_ids
19101 19101 19101
submitter_portion_ids
19101
case_id
is the same as id
.
table(star_cases$case_id == star_cases$id)
TRUE
19101
One case ID to multiple sample IDs.
head(star_cases$sample_ids, 3)
$`9453db51-fff8-4a78-a29c-bb9151e9bd2a`
[1] "6662a85c-37b7-48b1-a8c6-f00171bb8226"
[2] "9bab246d-4a0d-4f28-ba1f-56b19a6f93bb"
[3] "6b8ea6bb-d10b-474a-9b4b-f406285dfb2f"
$`9485e946-f569-46fb-b77e-e5af68f7961a`
[1] "e3f781a2-f087-4abb-8f36-af799e837557"
[2] "cc8c2432-4107-4b5a-9452-3c536dac8baf"
[3] "330292a0-80dd-4fc4-a64c-4fce119dcbb6"
$`981300da-9136-402a-88df-2c76b1e3ad87`
[1] "42c67b29-94a1-4520-9122-b2daa02a03ad"
[2] "9d351761-59cb-40f7-aee2-ce2c6365acc2"
[3] "9276070c-cab5-4ba3-978d-2d18976a8758"
Sample IDs to case IDs.
sample_id_len <- sapply(star_cases$sample_ids, length)
my_ids <- rep(names(sample_id_len), sample_id_len)
sample_id_lookup <- data.frame(
sample_ids = unlist(star_cases$sample_ids),
case_id = my_ids,
row.names = NULL
)
head(sample_id_lookup)
sample_ids case_id
1 6662a85c-37b7-48b1-a8c6-f00171bb8226 9453db51-fff8-4a78-a29c-bb9151e9bd2a
2 9bab246d-4a0d-4f28-ba1f-56b19a6f93bb 9453db51-fff8-4a78-a29c-bb9151e9bd2a
3 6b8ea6bb-d10b-474a-9b4b-f406285dfb2f 9453db51-fff8-4a78-a29c-bb9151e9bd2a
4 e3f781a2-f087-4abb-8f36-af799e837557 9485e946-f569-46fb-b77e-e5af68f7961a
5 cc8c2432-4107-4b5a-9452-3c536dac8baf 9485e946-f569-46fb-b77e-e5af68f7961a
6 330292a0-80dd-4fc4-a64c-4fce119dcbb6 9485e946-f569-46fb-b77e-e5af68f7961a
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. This joint effort between NCI and the National Human Genome Research Institute began in 2006, bringing together researchers from diverse disciplines and multiple institutions.
Data from TCGA (gene expression, copy number variation, clinical information, etc.) are available via the Genomic Data Commons (GDC). Primary sequence data (stored in BAM files) are under controlled accession and data access should be requested via dbGaP and should be done by the PI.
Study Abbreviation | Study Name |
---|---|
LAML | Acute Myeloid Leukemia |
ACC | Adrenocortical carcinoma |
BLCA | Bladder Urothelial Carcinoma |
LGG | Brain Lower Grade Glioma |
BRCA | Breast invasive carcinoma |
CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma |
CHOL | Cholangiocarcinoma |
LCML | Chronic Myelogenous Leukemia |
COAD | Colon adenocarcinoma |
CNTL | Controls |
ESCA | Esophageal carcinoma |
FPPP | FFPE Pilot Phase II |
GBM | Glioblastoma multiforme |
HNSC | Head and Neck squamous cell carcinoma |
KICH | Kidney Chromophobe |
KIRC | Kidney renal clear cell carcinoma |
KIRP | Kidney renal papillary cell carcinoma |
LIHC | Liver hepatocellular carcinoma |
LUAD | Lung adenocarcinoma |
LUSC | Lung squamous cell carcinoma |
DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma |
MESO | Mesothelioma |
MISC | Miscellaneous |
OV | Ovarian serous cystadenocarcinoma |
PAAD | Pancreatic adenocarcinoma |
PCPG | Pheochromocytoma and Paraganglioma |
PRAD | Prostate adenocarcinoma |
READ | Rectum adenocarcinoma |
SARC | Sarcoma |
SKCM | Skin Cutaneous Melanoma |
STAD | Stomach adenocarcinoma |
TGCT | Testicular Germ Cell Tumors |
THYM | Thymoma |
THCA | Thyroid carcinoma |
UCS | Uterine Carcinosarcoma |
UCEC | Uterine Corpus Endometrial Carcinoma |
UVM | Uveal Melanoma |
From https://www.bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/query.html
A TCGA barcode is composed of a collection of identifiers. Each specifically identifies a TCGA data element. Refer to the following figure for an illustration of how metadata identifiers comprise a barcode. An aliquot barcode contains the highest number of identifiers. For example:
Aliquot barcode: TCGA-G4-6317-02A-11D-2064-05 Participant: TCGA-G4-6317 Sample: TCGA-G4-6317-02
Fetch projects.
projects() %>% results(size=100) -> my_projects
str(my_projects, max.level = 1)
List of 9
$ id : chr [1:82] "CGCI-HTMCP-CC" "TARGET-AML" "GENIE-JHU" "GENIE-MSK" ...
$ primary_site :List of 82
$ dbgap_accession_number: chr [1:82] "phs000528" "phs000465" NA NA ...
$ project_id : chr [1:82] "CGCI-HTMCP-CC" "TARGET-AML" "GENIE-JHU" "GENIE-MSK" ...
$ disease_type :List of 82
$ name : chr [1:82] "HIV+ Tumor Molecular Characterization Project - Cervical Cancer" "Acute Myeloid Leukemia" "AACR Project GENIE - Contributed by Johns Hopkins Sidney Kimmel Comprehensive Cancer Center" "AACR Project GENIE - Contributed by Memorial Sloan Kettering Cancer Center" ...
$ releasable : logi [1:82] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ state : chr [1:82] "open" "open" "open" "open" ...
$ released : logi [1:82] TRUE TRUE TRUE TRUE TRUE TRUE ...
- attr(*, "row.names")= int [1:82] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "class")= chr [1:3] "GDCprojectsResults" "GDCResults" "list"
Project IDs.
my_projects$id
[1] "CGCI-HTMCP-CC" "TARGET-AML"
[3] "GENIE-JHU" "GENIE-MSK"
[5] "GENIE-VICC" "GENIE-MDA"
[7] "TCGA-MESO" "TARGET-ALL-P3"
[9] "TCGA-UVM" "TCGA-KICH"
[11] "TARGET-WT" "TARGET-OS"
[13] "TCGA-DLBC" "GENIE-UHN"
[15] "APOLLO-LUAD" "CDDP_EAGLE-1"
[17] "EXCEPTIONAL_RESPONDERS-ER" "MP2PRT-WT"
[19] "CGCI-HTMCP-DLBCL" "CMI-MPC"
[21] "WCDT-MCRPC" "TCGA-CHOL"
[23] "TCGA-UCS" "TCGA-PCPG"
[25] "CPTAC-2" "TCGA-CESC"
[27] "TCGA-LIHC" "TCGA-ACC"
[29] "CMI-MBC" "TCGA-BRCA"
[31] "CPTAC-3" "TCGA-COAD"
[33] "TCGA-GBM" "TCGA-TGCT"
[35] "NCICCR-DLBCL" "TCGA-LGG"
[37] "FM-AD" "GENIE-GRCC"
[39] "CTSP-DLBCL1" "TARGET-CCSK"
[41] "GENIE-NKI" "TARGET-ALL-P1"
[43] "MATCH-N" "TRIO-CRU"
[45] "CMI-ASC" "TARGET-RT"
[47] "ORGANOID-PANCREATIC" "MATCH-Z1D"
[49] "MATCH-B" "VAREPOP-APOLLO"
[51] "MATCH-Q" "BEATAML1.0-CRENOLANIB"
[53] "MATCH-Y" "OHSU-CNL"
[55] "CGCI-HTMCP-LC" "TARGET-NBL"
[57] "TCGA-SARC" "TCGA-PAAD"
[59] "TCGA-LUAD" "TCGA-PRAD"
[61] "MP2PRT-ALL" "TCGA-LUSC"
[63] "TCGA-LAML" "TCGA-SKCM"
[65] "HCMI-CMDC" "BEATAML1.0-COHORT"
[67] "TCGA-BLCA" "TCGA-READ"
[69] "TCGA-UCEC" "TCGA-THCA"
[71] "TCGA-OV" "TCGA-KIRC"
[73] "MMRF-COMMPASS" "GENIE-DFCI"
[75] "TCGA-HNSC" "TCGA-ESCA"
[77] "CGCI-BLGSP" "TARGET-ALL-P2"
[79] "TCGA-STAD" "REBC-THYR"
[81] "TCGA-KIRP" "TCGA-THYM"
Available (i.e. open) STAR metadata.
get_star_metadata <- function(proj){
files() %>%
filter(cases.project.project_id == proj) %>%
filter(analysis.workflow_type == 'STAR - Counts') %>%
filter(access == 'open') %>%
GenomicDataCommons::select(
c(
default_fields('files'),
"cases.case_id",
"cases.samples.sample_type",
"cases.samples.sample_id"
)
) %>%
results_all()
}
ov_star <- get_star_metadata("TCGA-OV")
str(ov_star, max.level = 1)
List of 17
$ id : chr [1:429] "96aca0af-a776-460d-95ff-87e364e4ac99" "b668c86b-fa56-4d39-9529-5b47081a3faa" "60678f17-e3d7-40cd-99ff-73706497968a" "38fb3b15-f838-4d5b-a830-6051067d8e2e" ...
$ data_format : chr [1:429] "TSV" "TSV" "TSV" "TSV" ...
$ cases :List of 429
$ access : chr [1:429] "open" "open" "open" "open" ...
$ file_name : chr [1:429] "21ff9928-00f0-4b96-8d70-35e9bfad5d40.rna_seq.augmented_star_gene_counts.tsv" "41bdbd88-b4b2-4884-8a44-b34656ae4156.rna_seq.augmented_star_gene_counts.tsv" "03c8e4fe-1e07-4ea3-a154-c17c2e8af508.rna_seq.augmented_star_gene_counts.tsv" "7d14ddef-7a1b-4515-9536-3fc4a9b85702.rna_seq.augmented_star_gene_counts.tsv" ...
$ submitter_id : chr [1:429] "41b13518-8f35-4369-8ff2-2b694d3e0091" "a542eb04-0978-42b1-b5e6-8473ddf04526" "57c246c4-1c0c-470e-8914-696bb7815c02" "d13d4d38-b242-4308-90d7-43aa485abcb5" ...
$ data_category : chr [1:429] "Transcriptome Profiling" "Transcriptome Profiling" "Transcriptome Profiling" "Transcriptome Profiling" ...
$ acl :List of 429
$ type : chr [1:429] "gene_expression" "gene_expression" "gene_expression" "gene_expression" ...
$ file_size : int [1:429] 4240026 4259621 4244112 4241087 4251142 4252582 4257244 4272506 4236251 4250530 ...
$ created_datetime : chr [1:429] "2021-12-13T20:45:56.142462-06:00" "2021-12-13T20:47:21.099504-06:00" "2021-12-13T20:44:24.979694-06:00" "2021-12-13T20:49:39.972683-06:00" ...
$ md5sum : chr [1:429] "c8b0b56114b382ae7855c47092aaf391" "9fe002d9d9512b99ad44edfa0c0bcd37" "625d9b63d9a37c3a80afd29db8ea6641" "f0c4926d57469765026470b21876a8bd" ...
$ updated_datetime : chr [1:429] "2022-01-19T14:47:35.686434-06:00" "2022-01-19T14:47:42.525493-06:00" "2022-01-19T14:47:22.611372-06:00" "2022-01-19T14:47:15.461468-06:00" ...
$ file_id : chr [1:429] "96aca0af-a776-460d-95ff-87e364e4ac99" "b668c86b-fa56-4d39-9529-5b47081a3faa" "60678f17-e3d7-40cd-99ff-73706497968a" "38fb3b15-f838-4d5b-a830-6051067d8e2e" ...
$ data_type : chr [1:429] "Gene Expression Quantification" "Gene Expression Quantification" "Gene Expression Quantification" "Gene Expression Quantification" ...
$ state : chr [1:429] "released" "released" "released" "released" ...
$ experimental_strategy: chr [1:429] "RNA-Seq" "RNA-Seq" "RNA-Seq" "RNA-Seq" ...
- attr(*, "row.names")= int [1:429] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "class")= chr [1:3] "GDCfilesResults" "GDCResults" "list"
Examine a single case.
str(ov_star$cases$`96aca0af-a776-460d-95ff-87e364e4ac99`)
'data.frame': 1 obs. of 2 variables:
$ case_id: chr "9446e349-71e6-455a-aa8f-53ec96597146"
$ samples:List of 1
..$ :'data.frame': 1 obs. of 2 variables:
.. ..$ sample_id : chr "1d568bd2-d658-40fa-a341-daa4d2a5bb22"
.. ..$ sample_type: chr "Primary Tumor"
Case IDs are unique.
length(unique(ov_star$id)) == length(ov_star$id)
[1] TRUE
Each case ID contains samples.
ov_star$cases$`96aca0af-a776-460d-95ff-87e364e4ac99`
case_id samples
1 9446e349-71e6-455a-aa8f-53ec96597146 1d568bd2....
Build data frame.
sapply(ov_star$cases, function(x) x$samples) |>
do.call(rbind.data.frame, args = _) -> ov_star_cases
dim(ov_star_cases)
[1] 429 2
Sample types.
table(ov_star_cases$sample_type)
Primary Tumor Recurrent Tumor
421 8
Get additional case data for OV.
get_case_metadata <- function(proj){
treatment_fields <- grep("treatment", available_fields("cases"), ignore.case = TRUE, value = TRUE)
sample_fields <- grep("samples.sample_", available_fields("cases"), ignore.case = TRUE, value = TRUE)
cases() %>%
filter(project.project_id == proj) %>%
GenomicDataCommons::select(
c(
default_fields('cases'),
sample_fields,
treatment_fields
)
) %>%
results_all()
}
ov_cases <- get_case_metadata("TCGA-OV")
str(ov_cases, max.level = 1)
List of 22
$ id : chr [1:608] "cce34351-1700-405b-818f-a598f63a33e8" "cd49126a-ec15-43fa-9e43-3f7460d43f2b" "cd6e5d3d-1c86-40dd-9cb3-b2e2075dec56" "cddbac56-2861-46a5-98a3-df32ab69d5da" ...
$ slide_ids :List of 608
$ submitter_slide_ids :List of 608
$ disease_type : chr [1:608] "Cystic, Mucinous and Serous Neoplasms" "Cystic, Mucinous and Serous Neoplasms" "Cystic, Mucinous and Serous Neoplasms" "Cystic, Mucinous and Serous Neoplasms" ...
$ analyte_ids :List of 608
$ submitter_id : chr [1:608] "TCGA-31-1955" "TCGA-13-1504" "TCGA-24-1469" "TCGA-04-1353" ...
$ submitter_analyte_ids :List of 608
$ aliquot_ids :List of 608
$ submitter_aliquot_ids :List of 608
$ diagnoses :List of 608
$ created_datetime : logi [1:608] NA NA NA NA NA NA ...
$ diagnosis_ids :List of 608
$ samples :List of 608
$ sample_ids :List of 608
$ submitter_sample_ids :List of 608
$ primary_site : chr [1:608] "Ovary" "Ovary" "Ovary" "Ovary" ...
$ submitter_diagnosis_ids:List of 608
$ updated_datetime : chr [1:608] "2019-08-16T15:20:09.988356-05:00" "2019-08-06T14:40:41.923992-05:00" "2019-08-06T14:41:05.270815-05:00" "2019-08-06T14:40:06.221317-05:00" ...
$ case_id : chr [1:608] "cce34351-1700-405b-818f-a598f63a33e8" "cd49126a-ec15-43fa-9e43-3f7460d43f2b" "cd6e5d3d-1c86-40dd-9cb3-b2e2075dec56" "cddbac56-2861-46a5-98a3-df32ab69d5da" ...
$ state : chr [1:608] "released" "released" "released" "released" ...
$ portion_ids :List of 608
$ submitter_portion_ids :List of 608
- attr(*, "row.names")= int [1:608] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "class")= chr [1:3] "GDCcasesResults" "GDCResults" "list"
Treatment type.
cases() %>%
filter(project.project_id == 'TCGA-OV') %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 587 pharmaceutical therapy, nos
2 587 radiation therapy, nos
3 21 _missing
Check out the treatments.
str(ov_cases$diagnoses$`cce34351-1700-405b-818f-a598f63a33e8`$treatments)
List of 1
$ :'data.frame': 2 obs. of 16 variables:
..$ treatment_intent_type : logi [1:2] NA NA
..$ updated_datetime : chr [1:2] "2019-07-31T16:17:41.335989-05:00" "2019-07-31T16:17:41.335989-05:00"
..$ treatment_id : chr [1:2] "93662700-6cf0-567b-af2e-8289a49e319a" "dafb6206-fade-54bb-a976-97758882f343"
..$ submitter_id : chr [1:2] "TCGA-31-1955_treatment" "TCGA-31-1955_treatment_1"
..$ treatment_type : chr [1:2] "Radiation Therapy, NOS" "Pharmaceutical Therapy, NOS"
..$ state : chr [1:2] "released" "released"
..$ therapeutic_agents : logi [1:2] NA NA
..$ treatment_or_therapy : chr [1:2] "yes" "yes"
..$ created_datetime : chr [1:2] NA "2019-04-28T09:28:03.174985-05:00"
..$ days_to_treatment_end : logi [1:2] NA NA
..$ days_to_treatment_start : logi [1:2] NA NA
..$ regimen_or_line_of_therapy: logi [1:2] NA NA
..$ treatment_effect : logi [1:2] NA NA
..$ initial_disease_status : logi [1:2] NA NA
..$ treatment_anatomic_site : logi [1:2] NA NA
..$ treatment_outcome : logi [1:2] NA NA
There is no information on carboplatin or paclitaxel.
Meta data for pancreatic adenocarcinoma.
paad_star <- get_star_metadata("TCGA-PAAD")
sapply(paad_star$cases, function(x) x$samples) |>
do.call(rbind.data.frame, args = _) -> paad_star_cases
dim(paad_star_cases)
[1] 183 2
Sample types.
table(paad_star_cases$sample_type)
Metastatic Primary Tumor Solid Tissue Normal
1 178 4
Treatment type.
cases() %>%
filter(project.project_id == 'TCGA-PAAD') %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 185 pharmaceutical therapy, nos
2 185 radiation therapy, nos
Meta data for esophageal carcinoma.
esca_star <- get_star_metadata("TCGA-ESCA")
sapply(esca_star$cases, function(x) x$samples) |>
do.call(rbind.data.frame, args = _) -> esca_star_cases
dim(esca_star_cases)
[1] 198 2
Sample types.
table(esca_star_cases$sample_type)
Metastatic Primary Tumor Solid Tissue Normal
1 184 13
Treatment type.
cases() %>%
filter(project.project_id == 'TCGA-ESCA') %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 185 pharmaceutical therapy, nos
2 185 radiation therapy, nos
Meta data for head and neck squamous cell carcinoma.
hnsc_star <- get_star_metadata("TCGA-HNSC")
sapply(hnsc_star$cases, function(x) x$samples) |>
do.call(rbind.data.frame, args = _) -> hnsc_star_cases
dim(hnsc_star_cases)
[1] 566 2
Sample types.
table(hnsc_star_cases$sample_type)
Metastatic Primary Tumor Solid Tissue Normal
2 520 44
Treatment type.
cases() %>%
filter(project.project_id == 'TCGA-HNSC') %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 528 pharmaceutical therapy, nos
2 528 radiation therapy, nos
Meta data for kidney renal clear cell carcinoma.
kirc_star <- get_star_metadata("TCGA-KIRC")
sapply(kirc_star$cases, function(x) x$samples) |>
do.call(rbind.data.frame, args = _) -> kirc_star_cases
dim(kirc_star_cases)
[1] 614 2
Sample types.
table(kirc_star_cases$sample_type)
Additional - New Primary Primary Tumor Solid Tissue Normal
1 541 72
Treatment type.
cases() %>%
filter(project.project_id == 'TCGA-KIRC') %>%
filter(files.access == 'open') %>%
facet("diagnoses.treatments.treatment_type") %>%
aggregations()
$diagnoses.treatments.treatment_type
doc_count key
1 537 pharmaceutical therapy, nos
2 537 radiation therapy, nos
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicDataCommons_1.26.0 magrittr_2.0.3
[3] lubridate_1.9.3 forcats_1.0.0
[5] stringr_1.5.0 dplyr_1.1.3
[7] purrr_1.0.2 readr_2.1.4
[9] tidyr_1.3.0 tibble_3.2.1
[11] ggplot2_3.4.4 tidyverse_2.0.0
[13] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.4 xfun_0.40 bslib_0.5.1
[4] htmlwidgets_1.6.2 processx_3.8.2 callr_3.7.3
[7] tzdb_0.4.0 crosstalk_1.2.0 vctrs_0.6.4
[10] tools_4.3.2 ps_1.7.5 bitops_1.0-7
[13] generics_0.1.3 curl_5.1.0 stats4_4.3.2
[16] fansi_1.0.5 pkgconfig_2.0.3 S4Vectors_0.40.1
[19] lifecycle_1.0.3 GenomeInfoDbData_1.2.11 compiler_4.3.2
[22] git2r_0.32.0 munsell_0.5.0 getPass_0.2-2
[25] httpuv_1.6.12 GenomeInfoDb_1.38.0 htmltools_0.5.6.1
[28] sass_0.4.7 RCurl_1.98-1.12 yaml_2.3.7
[31] crayon_1.5.2 later_1.3.1 pillar_1.9.0
[34] jquerylib_0.1.4 whisker_0.4.1 ellipsis_0.3.2
[37] DT_0.30 cachem_1.0.8 tidyselect_1.2.0
[40] digest_0.6.33 stringi_1.7.12 rprojroot_2.0.3
[43] fastmap_1.1.1 grid_4.3.2 colorspace_2.1-0
[46] cli_3.6.1 utf8_1.2.4 withr_2.5.2
[49] rappdirs_0.3.3 scales_1.2.1 promises_1.2.1
[52] timechange_0.2.0 XVector_0.42.0 rmarkdown_2.25
[55] httr_1.4.7 hms_1.1.3 evaluate_0.22
[58] knitr_1.45 GenomicRanges_1.54.1 IRanges_2.36.0
[61] rlang_1.1.1 Rcpp_1.0.11 glue_1.6.2
[64] xml2_1.3.5 BiocGenerics_0.48.0 rstudioapi_0.15.0
[67] jsonlite_1.8.7 R6_2.5.1 zlibbioc_1.48.0
[70] fs_1.6.3