Last updated: 2022-08-15

Checks: 7 0

Knit directory: esoph-micro-cancer-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200916) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cb1cd82. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data.zip
    Ignored:    data/
    Ignored:    output/Supplement Figure 2.zip

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/supplemental_figure2.Rmd) and HTML (docs/supplemental_figure2.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd cb1cd82 noah-padgett 2022-08-15 Updated sup figure 2 parts

Histology (+/- Barretts)

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Barretts = ifelse(`Barretts.`=="Y",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

dat <- dat.rna.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Barretts = ifelse(Barretts == 1, "Yes", "No")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 120 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 137 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 127 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Barretts))%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 828 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 822 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 818 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Barretts))%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 321 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 304 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 337 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Barretts = ifelse(`Barretts.`=="Y",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

dat <- dat.rna.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Barretts = ifelse(Barrett.s.Esophagus.Reported=="Yes",1,0)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Barretts)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Barretts = ifelse(Barretts == 1, "Yes", "No")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Barretts")
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 44 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 40 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 45 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Removed 109 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 30 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 22 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 25 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 31 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 33 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Removed 33 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 70 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 67 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 65 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 134 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 132 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 125 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 52 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 49 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 3 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 4 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 6 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Barretts), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Barretts, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Barretts", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2C_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2C_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

Sex

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Gender = ifelse(gender=="M","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

dat <- dat.rna.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 122 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 137 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 129 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Gender))%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 817 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 840 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 749 rows containing non-finite values (stat_ydensity).
Warning: Removed 823 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Gender))%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 322 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 305 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 314 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number),
                Gender = ifelse(gender=="M","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

dat <- dat.rna.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Gender = ifelse(Gender=="male","Male","Female")) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Gender)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Gender")
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 39 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 39 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 39 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 26 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 22 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 19 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 110 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Removed 110 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 29 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 32 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 62 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 63 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 65 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 123 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 134 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 129 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 48 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Removed 48 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 5 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 5 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 2 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 108 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 111 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 107 rows containing non-finite values (stat_ydensity).
Warning: Removed 109 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Gender), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Gender, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Gender", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 49 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2D_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 46 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2D_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

Race

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
analysis.dat$Race[analysis.dat$Race == "asian"] <- "Asian"
analysis.dat$Race[analysis.dat$Race == "B"] <- "Black"
analysis.dat$Race[analysis.dat$Race == "black or african american"] <- "Black"
analysis.dat$Race[analysis.dat$Race == "H"] <- "Hispanic"
analysis.dat$Race[analysis.dat$Race == "O"] <- "Other"
analysis.dat$Race[analysis.dat$Race == "W"] <- "White"
analysis.dat$Race[analysis.dat$Race == "white"] <- "White"

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 132 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 141 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 136 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Race))%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 651 rows containing non-finite values (stat_ydensity).
Warning: Removed 714 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 651 rows containing non-finite values (stat_ydensity).
Warning: Removed 712 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 651 rows containing non-finite values (stat_ydensity).
Warning: Removed 725 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Race))%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 305 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 309 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 112 rows containing non-finite values (stat_ydensity).
Warning: Removed 316 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, Race)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor")
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"Race")
analysis.dat$Race[analysis.dat$Race == "asian"] <- "Asian"
analysis.dat$Race[analysis.dat$Race == "B"] <- "Black"
analysis.dat$Race[analysis.dat$Race == "black or african american"] <- "Black"
analysis.dat$Race[analysis.dat$Race == "H"] <- "Hispanic"
analysis.dat$Race[analysis.dat$Race == "O"] <- "Other"
analysis.dat$Race[analysis.dat$Race == "W"] <- "White"
analysis.dat$Race[analysis.dat$Race == "white"] <- "White"

p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 46 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 36 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 44 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 96 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 96 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 96 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 37 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 39 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 38 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 21 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 21 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 25 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 98 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 97 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 97 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 35 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 59 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 59 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 69 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 108 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 113 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 114 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 47 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 52 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 46 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 2 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 1 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 6 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 99 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 97 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 93 rows containing non-finite values (stat_ydensity).
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 99 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Race), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Race, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Race", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 48 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2E_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 51 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2E_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 16 rows containing non-finite values (stat_ydensity).
Warning: Removed 50 rows containing missing values (geom_point).

Stage

#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.25)
}
#inverse root function
invroot<-function(x){
  x**(4)
}
DIM <- c(6, 4)

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Tumor_Stage = tumor.stage
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
#analysis.dat$Tumor_Stage[analysis.dat$Tumor_Stage == "0"] <- ""

p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 127 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 126 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 129 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="rna", !is.na(Tumor_Stage))%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 728 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 730 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 658 rows containing non-finite values (stat_ydensity).
Warning: Removed 734 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(source=="wgs", !is.na(Tumor_Stage))%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 276 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_combined.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 282 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_combined.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 98 rows containing non-finite values (stat_ydensity).
Warning: Removed 295 rows containing missing values (geom_point).

Subset by Bacterium

# merge datasets by subsetting to specific variables then merging
analysis.dat <- dat.16s.s %>% 
  dplyr::mutate(ID = as.factor(accession.number)) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

dat <- dat.rna.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)
analysis.dat <- full_join(analysis.dat, dat)
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
dat <- dat.wgs.s %>% 
  dplyr::mutate(Race = race) %>%
  dplyr::select(OTU, sample_type, tumor, Abundance, ID, source, tumor.stage)

analysis.dat <- full_join(analysis.dat, dat) %>%
  mutate(
    pres = ifelse(Abundance > 0, 1, 0),
    Abund = Abundance*100,
    Tumor = ifelse(tumor==1, "Tumor", "No Tumor"),
    Tumor_Stage = tumor.stage
  )
Joining, by = c("OTU", "sample_type", "tumor", "Abundance", "ID", "source",
"tumor.stage")
#analysis.dat$Tumor_Stage[analysis.dat$Tumor_Stage == "0"] <- ""


p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 46 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Removed 46 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 40 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Removed 97 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Fusobacterium nucleatum")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 38 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_fuso.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 35 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_fuso.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 36 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Prevotella melaninogenica" | OTU =="Prevotella spp.")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 20 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 18 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 14 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Prevotella melaninogenica")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 31 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_prevo.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 29 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_prevo.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 34 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 66 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 57 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 66 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 112 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 118 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 103 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Campylobacter concisus")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 39 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_campy.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 41 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_campy.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 48 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="16s", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 6 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_NCI_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Warning: Removed 4 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_NCI_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Groups with fewer than two data points have been dropped.
Removed 4 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="rna", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 97 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_rna_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Warning: Removed 96 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_rna_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 94 rows containing non-finite values (stat_ydensity).
Removed 96 rows containing missing values (geom_point).
p <- analysis.dat %>%
  filter(!is.na(Tumor_Stage), source=="wgs", OTU == "Streptococcus sanguinis")%>%
  ggplot(aes(x=Tumor_Stage, y=Abund))+
    geom_violin(scale="width", adjust=1)+
    geom_jitter(alpha=0.5, width = 0.25)+
    scale_y_continuous(
      trans=scales::trans_new("root", root, invroot),
      breaks=c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      labels = c(0, 0.001,0.01, 0.1, 1,10,50, 100),
      limits = c(0, 110)
    ) + 
    labs(x="Tumor_Stage", y="Relative Abundance (%)")+
    theme_classic()
p
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

ggsave("output/supplemental_figure2F_tcga_wgs_strepto.pdf", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 42 rows containing missing values (geom_point).
ggsave("output/supplemental_figure2F_tcga_wgs_strepto.png", p, units = "in", width = DIM[1], height = DIM[2])
Warning: Removed 14 rows containing non-finite values (stat_ydensity).
Warning: Removed 45 rows containing missing values (geom_point).

sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1     dendextend_1.16.0 ggdendro_0.1.23   reshape2_1.4.4   
 [5] car_3.1-0         carData_3.0-5     gvlma_1.0.0.3     patchwork_1.1.1  
 [9] viridis_0.6.2     viridisLite_0.4.0 gridExtra_2.3     xtable_1.8-4     
[13] kableExtra_1.3.4  MASS_7.3-56       data.table_1.14.2 readxl_1.4.0     
[17] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.9       purrr_0.3.4      
[21] readr_2.1.2       tidyr_1.2.0       tibble_3.1.7      ggplot2_3.3.6    
[25] tidyverse_1.3.2   lmerTest_3.1-3    lme4_1.1-30       Matrix_1.4-1     
[29] vegan_2.6-2       lattice_0.20-45   permute_0.9-7     phyloseq_1.40.0  
[33] workflowr_1.7.0  

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0      minqa_1.2.4            colorspace_2.0-3      
  [4] ellipsis_0.3.2         rprojroot_2.0.3        XVector_0.36.0        
  [7] fs_1.5.2               rstudioapi_0.13        farver_2.1.1          
 [10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
 [13] codetools_0.2-18       splines_4.2.0          cachem_1.0.6          
 [16] knitr_1.39             ade4_1.7-19            jsonlite_1.8.0        
 [19] nloptr_2.0.3           broom_1.0.0            cluster_2.1.3         
 [22] dbplyr_2.2.1           BiocManager_1.30.18    compiler_4.2.0        
 [25] httr_1.4.3             backports_1.4.1        assertthat_0.2.1      
 [28] fastmap_1.1.0          gargle_1.2.0           cli_3.3.0             
 [31] later_1.3.0            htmltools_0.5.2        tools_4.2.0           
 [34] igraph_1.3.4           gtable_0.3.0           glue_1.6.2            
 [37] GenomeInfoDbData_1.2.8 Rcpp_1.0.8.3           Biobase_2.56.0        
 [40] cellranger_1.1.0       jquerylib_0.1.4        vctrs_0.4.1           
 [43] Biostrings_2.64.0      rhdf5filters_1.8.0     multtest_2.52.0       
 [46] svglite_2.1.0          ape_5.6-2              nlme_3.1-157          
 [49] iterators_1.0.14       xfun_0.31              ps_1.7.0              
 [52] rvest_1.0.2            lifecycle_1.0.1        googlesheets4_1.0.0   
 [55] getPass_0.2-2          zlibbioc_1.42.0        scales_1.2.0          
 [58] hms_1.1.1              promises_1.2.0.1       parallel_4.2.0        
 [61] biomformat_1.24.0      rhdf5_2.40.0           yaml_2.3.5            
 [64] sass_0.4.2             stringi_1.7.6          highr_0.9             
 [67] S4Vectors_0.34.0       foreach_1.5.2          BiocGenerics_0.42.0   
 [70] boot_1.3-28            GenomeInfoDb_1.32.2    systemfonts_1.0.4     
 [73] rlang_1.0.2            pkgconfig_2.0.3        bitops_1.0-7          
 [76] evaluate_0.15          Rhdf5lib_1.18.2        processx_3.7.0        
 [79] tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3        
 [82] R6_2.5.1               IRanges_2.30.0         generics_0.1.3        
 [85] DBI_1.1.3              withr_2.5.0            pillar_1.8.0          
 [88] haven_2.5.0            whisker_0.4            mgcv_1.8-40           
 [91] abind_1.4-5            survival_3.3-1         RCurl_1.98-1.8        
 [94] modelr_0.1.8           crayon_1.5.1           utf8_1.2.2            
 [97] tzdb_0.3.0             rmarkdown_2.14         grid_4.2.0            
[100] callr_3.7.1            git2r_0.30.1           webshot_0.5.3         
[103] reprex_2.0.1           digest_0.6.29          httpuv_1.6.5          
[106] numDeriv_2016.8-1.1    stats4_4.2.0           munsell_0.5.0         
[109] bslib_0.4.0