Last updated: 2021-02-04
Checks: 6 1
Knit directory: esoph-micro-cancer-workflow/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200916)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b9a498d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/
Unstaged changes:
Modified: analysis/results-question-3.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/results-question-3.Rmd
) and HTML (docs/results-question-3.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b9a498d | noah-padgett | 2021-01-26 | fixed issue with subsetting |
html | b9a498d | noah-padgett | 2021-01-26 | fixed issue with subsetting |
Rmd | 6585907 | noah-padgett | 2021-01-21 | updated figures with dendrogram |
Q3: Is fuso associated with tumor stage (pTNM) in either data set? Does X bacteria predict stage? Multivariable w/ age, sex, BMI, history of Barrett's
Add to this analysis:
TCGA drop “not reported” from tumor stage.
# in long format
table(dat.16s$tumor.stage)
0 1 I II III IV
11088 264 6336 13464 6072 2640
not reported
1848
# by subject
dat <- dat.16s %>% filter(OTU == "Fusobacterium_nucleatum")
table(dat$tumor.stage)
0 1 I II III IV
42 1 24 51 23 10
not reported
7
sum(table(dat$tumor.stage)) # sample size met
[1] 158
dat.16s.s$Abundancej <- dat.16s.s$Abundance+0.01
mean.dat <- dat.16s.s %>%
group_by(tumor.stage, OTU) %>%
summarize(M = mean(Abundance))
`summarise()` has grouped output by 'tumor.stage'. You can override using the `.groups` argument.
ggplot(dat.16s.s, aes(x=tumor.stage, y=Abundancej))+
geom_violin()+
geom_jitter(alpha=0.25,width = 0.25)+
geom_point(data=mean.dat, aes(x=tumor.stage, y = M), size=2, alpha =0.9, color="red")+
labs(x="Tumor Stage",
title="Distribution of abundance across tumor stage",
subtitle="Red dot is average abundnace")+
scale_y_continuous(
breaks=c(0, 10, 100, 200, 300, 400, 500),
limits = c(0,500),
trans="sqrt")+
facet_wrap(.~OTU, nrow=1)+
theme_classic()
Stage “1” has only 1 unique sample and will be dropped from subsequent analyses.
dat.16s.s <- dat.16s.s %>%
filter(tumor.stage != "1")%>%
mutate(tumor.stage = droplevels(tumor.stage, exclude="1"))
fit <- nnet::multinom(tumor.stage ~ OTU, data=dat.16s.s)
# weights: 30 (20 variable)
initial value 1125.224947
iter 10 value 1005.376265
final value 1005.153918
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU, data = dat.16s.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -0.5596573 4.026999e-05 5.690943e-05
II 0.1941426 3.995800e-05 -9.227993e-06
III -0.6022530 9.552912e-05 1.105414e-04
IV -1.4353505 3.339310e-04 3.376027e-04
not reported -1.7915909 -2.164854e-04 -2.141523e-04
OTUPrevotella melaninogenica
I 3.976448e-05
II 4.157416e-05
III 9.500459e-05
IV 3.338067e-04
not reported -2.165636e-04
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.2558825 0.3618742 0.3618723
II 0.2083657 0.2946756 0.2946764
III 0.2594013 0.3668473 0.3668454
IV 0.3518980 0.4976310 0.4976292
not reported 0.4082124 0.5773327 0.5773309
OTUPrevotella melaninogenica
I 0.3618743
II 0.2946755
III 0.3668474
IV 0.4976310
not reported 0.5773328
Residual Deviance: 2010.308
AIC: 2050.308
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -2.1871652 0.0001112817 1.572638e-04
II 0.9317397 0.0001356000 -3.131568e-05
III -2.3217038 0.0002604057 3.013297e-04
IV -4.0788818 0.0006710414 6.784222e-04
not reported -4.3888697 -0.0003749750 -3.709351e-04
OTUPrevotella melaninogenica
I 0.0001098848
II 0.0001410845
III 0.0002589758
IV 0.0006707916
not reported -0.0003751106
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 2.873046e-02 0.9999112 0.9998745
II 3.514711e-01 0.9998918 0.9999750
III 2.024889e-02 0.9997922 0.9997596
IV 4.525283e-05 0.9994646 0.9994587
not reported 1.139413e-05 0.9997008 0.9997040
OTUPrevotella melaninogenica
I 0.9999123
II 0.9998874
III 0.9997934
IV 0.9994648
not reported 0.9997007
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.5714048 1.0000403 1.0000569
II 1.2142694 1.0000400 0.9999908
III 0.5475766 1.0000955 1.0001105
IV 0.2380319 1.0003340 1.0003377
not reported 0.1666948 0.9997835 0.9997859
OTUPrevotella melaninogenica
I 1.0000398
II 1.0000416
III 1.0000950
IV 1.0003339
not reported 0.9997835
# save fitted logits
pp <- fitted(fit)
# preditive data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"))
predict(fit, newdata = dotu, "probs") # only TINY differences
0 I II III IV not reported
1 0.2675244 0.1528647 0.3248466 0.1464901 0.06367934 0.04459491
2 0.2675124 0.1528640 0.3248451 0.1464975 0.06369775 0.04458326
3 0.2675153 0.1528682 0.3248326 0.1465013 0.06369868 0.04458385
4 0.2675123 0.1528639 0.3248455 0.1464974 0.06369772 0.04458324
## store the predicted probabilities for each value of ses
pp.otu <-cbind(dotu, predict(fit, newdata = dotu, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.otu[, 2:7], pp.otu$OTU, colMeans)
pp.otu$OTU: Campylobacter concisus
0 I II III IV not reported
0.26751530 0.15286823 0.32483264 0.14650130 0.06369868 0.04458385
------------------------------------------------------------
pp.otu$OTU: Fusobacterium nucleatum
0 I II III IV not reported
0.26752435 0.15286471 0.32484663 0.14649006 0.06367934 0.04459491
------------------------------------------------------------
pp.otu$OTU: Prevotella melaninogenica
0 I II III IV not reported
0.26751228 0.15286389 0.32484549 0.14649737 0.06369772 0.04458324
------------------------------------------------------------
pp.otu$OTU: Streptococcus spp.
0 I II III IV not reported
0.26751238 0.15286402 0.32484508 0.14649750 0.06369775 0.04458326
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance, data=dat.16s.s)
# weights: 36 (25 variable)
initial value 1125.224947
iter 10 value 1036.591057
iter 20 value 1003.572457
iter 30 value 1002.703847
final value 1002.703282
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance, data = dat.16s.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -0.6019482 -0.29770929 0.036768981
II 0.1493511 -0.31416197 0.038932244
III -0.6047665 -0.01998832 0.002120648
IV -1.4642044 -0.20835878 0.025000279
not reported -1.8374350 -0.31744429 0.039621982
OTUPrevotella melaninogenica Abundance
I -0.024130985 0.0025034384
II -0.025266176 0.0026228398
III -0.002131836 0.0001920192
IV -0.018128606 0.0018251074
not reported -0.025219107 0.0026488611
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.2579305 0.4212163 0.3630178
II 0.2100176 0.3418866 0.2956093
III 0.2608095 0.4170931 0.3675700
IV 0.3543782 0.5765068 0.4989885
not reported 0.4118364 0.6771845 0.5793856
OTUPrevotella melaninogenica Abundance
I 0.3624989 0.001743779
II 0.2953697 0.001461370
III 0.3673553 0.001912311
IV 0.4982317 0.002392437
not reported 0.5780688 0.002659606
Residual Deviance: 2005.407
AIC: 2055.407
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -2.3337612 -0.70678481 0.101287006
II 0.7111362 -0.91890701 0.131701681
III -2.3188053 -0.04792294 0.005769373
IV -4.1317569 -0.36141601 0.050101911
not reported -4.4615648 -0.46877073 0.068386206
OTUPrevotella melaninogenica Abundance
I -0.066568429 1.4356399
II -0.085540865 1.7947819
III -0.005803199 0.1004121
IV -0.036385893 0.7628655
not reported -0.043626480 0.9959599
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 1.960823e-02 0.4797002 0.9193226
II 4.769998e-01 0.3581442 0.8952203
III 2.040560e-02 0.9617777 0.9953967
IV 3.600011e-05 0.7177885 0.9600412
not reported 8.136333e-06 0.6392335 0.9454782
OTUPrevotella melaninogenica Abundance
I 0.9469253 0.15110484
II 0.9318314 0.07268846
III 0.9953697 0.92001713
IV 0.9709747 0.44554363
not reported 0.9652021 0.31926963
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.5477435 0.7425172 1.037453
II 1.1610806 0.7304007 1.039700
III 0.5462019 0.9802101 1.002123
IV 0.2312619 0.8119157 1.025315
not reported 0.1592253 0.7280072 1.040417
OTUPrevotella melaninogenica Abundance
I 0.9761578 1.002507
II 0.9750503 1.002626
III 0.9978704 1.000192
IV 0.9820347 1.001827
not reported 0.9750962 1.002652
# save fitted logits
pp <- fitted(fit)
# predit data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = mean(dat.16s.s$Abundance))
predict(fit, newdata = dotu, "probs") # bigger differences
0 I II III IV not reported
1 0.2555485 0.1574622 0.3356602 0.1408471 0.06439470 0.04608727
2 0.3036720 0.1389358 0.2913349 0.1640584 0.06212873 0.03987017
3 0.2497993 0.1596845 0.3411347 0.1379707 0.06453950 0.04687126
4 0.2593699 0.1560065 0.3321798 0.1426489 0.06418347 0.04561154
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:8], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
0 I II III IV not reported
0.18073001 0.18620713 0.40831655 0.10306018 0.06526185 0.05642428
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
0 I II III IV not reported
0.18548588 0.18439024 0.40347761 0.10555599 0.06537256 0.05571772
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
0 I II III IV not reported
0.18865464 0.18318669 0.40040147 0.10713583 0.06532547 0.05529589
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
0 I II III IV not reported
0.22655381 0.16863991 0.36313616 0.12645569 0.06522761 0.04998682
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance + OTU:Abundance, data=dat.16s.s)
# weights: 54 (40 variable)
initial value 1125.224947
iter 10 value 1090.589208
iter 20 value 1039.256245
iter 30 value 996.413681
iter 40 value 992.928158
iter 50 value 991.429386
iter 60 value 991.127307
iter 70 value 990.723318
final value 990.722744
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance + OTU:Abundance,
data = dat.16s.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -0.59669951 -0.26093056 -0.02367244
II -0.04925272 0.09502819 0.20651525
III -0.77567930 0.34796855 0.26293966
IV -1.04748982 -0.51444215 -0.38084125
not reported -1.70228029 -0.40001948 -0.08221021
OTUPrevotella melaninogenica Abundance
I -0.13897798 0.005360968
II 0.13369547 0.017833598
III 0.05111365 0.015423279
IV -0.71749336 -0.414573888
not reported -0.37627244 -0.022283246
OTUStreptococcus spp.:Abundance
I -0.00333574
II -0.01676429
III -0.01689038
IV 0.41549620
not reported 0.02438395
OTUCampylobacter concisus:Abundance
I 0.0185943761
II -0.0004488833
III -0.1622707887
IV 0.4097502356
not reported 0.0170529960
OTUPrevotella melaninogenica:Abundance
I 0.001887565
II -0.012888890
III -0.010003735
IV 0.425826443
not reported 0.032563443
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.2706201 0.4738624 0.3783916
II 0.2281113 0.3810820 0.3128107
III 0.2792175 0.4573298 0.3880796
IV 0.3761363 0.6418971 0.5206815
not reported 0.4330984 0.7670192 0.6017282
OTUPrevotella melaninogenica Abundance
I 0.4029752 0.012475637
II 0.3320564 0.009770337
III 0.4104489 0.010117141
IV 0.5654752 0.363818099
not reported 0.6506491 0.047707083
OTUStreptococcus spp.:Abundance
I 0.012623695
II 0.009903424
III 0.010350314
IV 0.363828064
not reported 0.047801354
OTUCampylobacter concisus:Abundance
I 0.02927336
II 0.02727100
III 0.17685556
IV 0.36829726
not reported 0.08268464
OTUPrevotella melaninogenica:Abundance
I 0.01392258
II 0.01129880
III 0.01205157
IV 0.36388004
not reported 0.04830831
Residual Deviance: 1981.445
AIC: 2061.445
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I -2.2049339 -0.5506462 -0.06256069
II -0.2159152 0.2493642 0.66019236
III -2.7780470 0.7608701 0.67754060
IV -2.7848677 -0.8014402 -0.73142846
not reported -3.9304699 -0.5215247 -0.13662350
OTUPrevotella melaninogenica Abundance
I -0.3448798 0.4297150
II 0.4026288 1.8252797
III 0.1245311 1.5244702
IV -1.2688326 -1.1395087
not reported -0.5783032 -0.4670847
OTUStreptococcus spp.:Abundance
I -0.2642443
II -1.6927768
III -1.6318716
IV 1.1420125
not reported 0.5101100
OTUCampylobacter concisus:Abundance
I 0.6351980
II -0.0164601
III -0.9175329
IV 1.1125530
not reported 0.2062414
OTUPrevotella melaninogenica:Abundance
I 0.1355758
II -1.1407307
III -0.8300775
IV 1.1702385
not reported 0.6740754
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 2.745873e-02 0.5818762 0.9501163
II 8.290538e-01 0.8030791 0.5091304
III 5.468671e-03 0.4467347 0.4980630
IV 5.354957e-03 0.4228768 0.4645175
not reported 8.478001e-05 0.6020013 0.8913284
OTUPrevotella melaninogenica Abundance
I 0.7301848 0.66740297
II 0.6872214 0.06795883
III 0.9008948 0.12739130
IV 0.2045008 0.25449104
not reported 0.5630595 0.64043931
OTUStreptococcus spp.:Abundance
I 0.79159164
II 0.09049797
III 0.10270655
IV 0.25344882
not reported 0.60997441
OTUCampylobacter concisus:Abundance
I 0.5252993
II 0.9868673
III 0.3588635
IV 0.2659004
not reported 0.8366023
OTUPrevotella melaninogenica:Abundance
I 0.8921567
II 0.2539820
III 0.4064950
IV 0.2419050
not reported 0.5002633
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
I 0.5506260 0.7703344 0.9766056
II 0.9519405 1.0996899 1.2293865
III 0.4603909 1.4161877 1.3007482
IV 0.3508173 0.5978340 0.6832864
not reported 0.1822674 0.6703070 0.9210783
OTUPrevotella melaninogenica Abundance
I 0.8702472 1.0053754
II 1.1430447 1.0179936
III 1.0524425 1.0155428
IV 0.4879739 0.6606217
not reported 0.6864153 0.9779632
OTUStreptococcus spp.:Abundance
I 0.9966698
II 0.9833755
III 0.9832515
IV 1.5151224
not reported 1.0246837
OTUCampylobacter concisus:Abundance
I 1.0187683
II 0.9995512
III 0.8502109
IV 1.5064415
not reported 1.0171992
OTUPrevotella melaninogenica:Abundance
I 1.0018893
II 0.9871938
III 0.9900461
IV 1.5308551
not reported 1.0330994
# save fitted logits
pp <- fitted(fit)
# predit data
gmeans <- dat.16s.s %>% group_by(OTU) %>% summarise(M = mean(Abundance))
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = gmeans$M)
predict(fit, newdata = dotu, "probs") # bigger differences
0 I II III IV not reported
1 0.2742667 0.1662494 0.3594195 0.1664778 5.703835e-05 0.03352950
2 0.2692405 0.1518491 0.3276064 0.1428078 6.429135e-02 0.04420488
3 0.2763845 0.1568249 0.3363098 0.1190824 6.553912e-02 0.04585922
4 0.2639438 0.1540951 0.3286240 0.1482418 6.139738e-02 0.04369800
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:8], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
0 I II III IV not reported
0.044166453 0.646759905 0.290804978 0.004111035 0.008386006 0.005771623
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
0 I II III IV not reported
0.054825510 0.046476481 0.708319996 0.184358597 0.002000020 0.004019395
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
0 I II III IV not reported
0.1006484 0.1888815 0.2510530 0.1237848 0.2187111 0.1169212
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
0 I II III IV not reported
0.24877956 0.17729604 0.33879008 0.11761193 0.06536973 0.05215265
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
# in long format
table(dat.rna$tumor.stage)
I II III IV not reported
16359 55309 39729 6232 17138
# by subject
dat <- dat.rna %>% filter(OTU == "Fusobacterium nucleatum")
table(dat$tumor.stage)
I II III IV not reported
21 71 51 8 22
sum(table(dat$tumor.stage)) # sample size met
[1] 173
dat.rna.s$Abundancej <- dat.rna.s$Abundance+0.01
mean.dat <- dat.rna.s %>%
group_by(tumor.stage, OTU) %>%
summarize(M = mean(Abundance))
`summarise()` has grouped output by 'tumor.stage'. You can override using the `.groups` argument.
ggplot(dat.rna.s, aes(x=tumor.stage, y=Abundancej))+
geom_violin()+
geom_jitter(alpha=0.25,width = 0.25)+
geom_point(data=mean.dat, aes(x=tumor.stage, y = M), size=2, alpha =0.9, color="red")+
labs(x="Tumor Stage",
title="Distribution of abundance across tumor stage",
subtitle="Red dot is average abundnace")+
scale_y_continuous(
breaks=c(0, 100, 1000, 10000, 100000, 150000),
#limits = c(0,500),
trans="pseudo_log")+
facet_wrap(.~OTU, nrow=1)+
theme_classic()
fit <- nnet::multinom(tumor.stage ~ OTU, data=dat.rna.s)
# weights: 25 (16 variable)
initial value 2505.894830
iter 10 value 2169.658898
final value 2157.953669
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU, data = dat.rna.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 1.21814839 3.650884e-05 4.293757e-06
III 0.88731491 6.336748e-06 1.097171e-04
IV -0.96505808 1.423347e-05 -8.499898e-05
not reported 0.04652072 -2.047866e-05 6.233591e-05
OTUPrevotella melaninogenica
II 1.491934e-04
III -2.261544e-05
IV -8.737889e-05
not reported 5.797583e-05
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.2484024 0.2683053 0.3512973
III 0.2592812 0.2800559 0.3666795
IV 0.4154712 0.4487604 0.5875797
not reported 0.3050797 0.3295244 0.4314487
OTUPrevotella melaninogenica
II 0.3512961
III 0.3666848
IV 0.5875829
not reported 0.4314511
Residual Deviance: 4315.907
AIC: 4347.907
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 4.9039323 1.360720e-04 1.222258e-05
III 3.4222112 2.262672e-05 2.992181e-04
IV -2.3228036 3.171732e-05 -1.446595e-04
not reported 0.1524871 -6.214612e-05 1.444805e-04
OTUPrevotella melaninogenica
II 4.246942e-04
III -6.167541e-05
IV -1.487090e-04
not reported 1.343741e-04
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 9.393689e-07 0.9998914 0.9999902
III 6.211403e-04 0.9999819 0.9997613
IV 2.018971e-02 0.9999747 0.9998846
not reported 8.788028e-01 0.9999504 0.9998847
OTUPrevotella melaninogenica
II 0.9996611
III 0.9999508
IV 0.9998813
not reported 0.9998928
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 3.3809218 1.0000365 1.000004
III 2.4285999 1.0000063 1.000110
IV 0.3809611 1.0000142 0.999915
not reported 1.0476198 0.9999795 1.000062
OTUPrevotella melaninogenica
II 1.0001492
III 0.9999774
IV 0.9999126
not reported 1.0000580
# save fitted logits
pp <- fitted(fit)
# preditive data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"))
predict(fit, newdata = dotu, "probs") # only TINY differences
I II III IV not reported
1 0.1213872 0.4104005 0.2948009 0.04624379 0.1271676
2 0.1213854 0.4104094 0.2947984 0.04624376 0.1271631
3 0.1213826 0.4103867 0.2948220 0.04623810 0.1271707
4 0.1213801 0.4104380 0.2947771 0.04623707 0.1271676
## store the predicted probabilities for each value of ses
pp.otu <-cbind(dotu, predict(fit, newdata = dotu, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.otu[, 2:6], pp.otu$OTU, colMeans)
pp.otu$OTU: Campylobacter concisus
I II III IV not reported
0.1213826 0.4103867 0.2948220 0.0462381 0.1271707
------------------------------------------------------------
pp.otu$OTU: Fusobacterium nucleatum
I II III IV not reported
0.12138718 0.41040055 0.29480088 0.04624379 0.12716761
------------------------------------------------------------
pp.otu$OTU: Prevotella melaninogenica
I II III IV not reported
0.12138015 0.41043802 0.29477715 0.04623707 0.12716762
------------------------------------------------------------
pp.otu$OTU: Streptococcus spp.
I II III IV not reported
0.12138537 0.41040941 0.29479835 0.04624376 0.12716311
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance, data=dat.rna.s)
# weights: 30 (20 variable)
initial value 2505.894830
iter 10 value 2187.209898
iter 20 value 2156.177222
iter 30 value 2154.856368
iter 30 value 2154.856366
iter 30 value 2154.856366
final value 2154.856366
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance, data = dat.rna.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 1.22389993 -0.005365375 -0.005610842
III 0.88005998 0.006916057 0.007172288
IV -1.13005106 0.163682102 0.164640189
not reported 0.03100875 0.014915539 0.015375416
OTUPrevotella melaninogenica Abundance
II -0.004611915 -1.845362e-05
III 0.006166936 1.645888e-05
IV 0.160760060 6.227932e-05
not reported 0.013607464 2.892287e-05
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.06528262 0.07029829 0.09577046
III 0.06872088 0.07468645 0.10257516
IV 0.08828314 0.06919437 0.01460932
not reported 0.07659641 0.07887242 0.10169916
OTUPrevotella melaninogenica Abundance
II 0.09597200 9.929516e-05
III 0.10267821 9.011621e-05
IV 0.01459417 9.001103e-05
not reported 0.10185631 9.174721e-05
Residual Deviance: 4309.713
AIC: 4349.713
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 18.7477134 -0.07632299 -0.05858635
III 12.8062973 0.09260123 0.06992227
IV -12.8003041 2.36554091 11.26953459
not reported 0.4048329 0.18910970 0.15118528
OTUPrevotella melaninogenica Abundance
II -0.04805479 -0.1858461
III 0.06006081 0.1826407
IV 11.01536518 0.6919076
not reported 0.13359471 0.3152452
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.0000000 0.93916214 0.9532816
III 0.0000000 0.92622036 0.9442555
IV 0.0000000 0.01800375 0.0000000
not reported 0.6856003 0.85000684 0.8798296
OTUPrevotella melaninogenica Abundance
II 0.9616726 0.8525654
III 0.9521072 0.8550800
IV 0.0000000 0.4889954
not reported 0.8937231 0.7525755
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 3.4004233 0.994649 0.9944049
III 2.4110443 1.006940 1.0071981
IV 0.3230168 1.177840 1.1789688
not reported 1.0314945 1.015027 1.0154942
OTUPrevotella melaninogenica Abundance
II 0.9953987 0.9999815
III 1.0061860 1.0000165
IV 1.1744031 1.0000623
not reported 1.0137005 1.0000289
# save fitted logits
pp <- fitted(fit)
# predit data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = mean(dat.rna.s$Abundance))
predict(fit, newdata = dotu, "probs") # bigger differences
I II III IV not reported
1 0.1223989 0.4150705 0.2958303 0.03990376 0.1267965
2 0.1213271 0.4092343 0.2952750 0.04658867 0.1275749
3 0.1213176 0.4091018 0.2953275 0.04662967 0.1276236
4 0.1213532 0.4096310 0.2951174 0.04646274 0.1274356
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:7], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
I II III IV not reported
0.12126844 0.40817153 0.29570285 0.04690866 0.12794852
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
I II III IV not reported
0.12235613 0.41414970 0.29622294 0.04014477 0.12712646
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
I II III IV not reported
0.12130445 0.40870080 0.29549344 0.04674087 0.12776045
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
I II III IV not reported
0.12127805 0.40830412 0.29565054 0.04686745 0.12789984
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance + OTU:Abundance, data=dat.rna.s)
# weights: 45 (32 variable)
initial value 2505.894830
iter 10 value 2470.269015
iter 20 value 2262.688066
iter 30 value 2164.261982
iter 40 value 2149.458456
iter 50 value 2149.039292
iter 60 value 2149.032581
iter 60 value 2149.032574
iter 60 value 2149.032574
final value 2149.032574
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance + OTU:Abundance,
data = dat.rna.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 1.22901888 -0.04369307 -0.05207418
III 0.88447777 -0.03058220 -0.08501825
IV -1.12626064 0.25030925 0.07721200
not reported 0.03722734 -0.02064624 -0.11097986
OTUPrevotella melaninogenica Abundance
II -0.05374046 -3.189722e-05
III -0.03067363 5.709804e-06
IV 0.11646654 5.137822e-05
not reported -0.07427308 1.598728e-05
OTUStreptococcus spp.:Abundance
II 0.002059400
III 0.002047309
IV -0.010330805
not reported 0.001876313
OTUCampylobacter concisus:Abundance
II 0.04055556
III 0.05030128
IV 0.04979592
not reported 0.05326765
OTUPrevotella melaninogenica:Abundance
II 0.001313536
III 0.001108104
IV 0.001260908
not reported 0.001697479
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.06746550 0.07266861 0.09885696
III 0.07107516 0.07709257 0.10582027
IV 0.09393397 0.07154462 0.01606452
not reported 0.07946376 0.08093691 0.10382815
OTUPrevotella melaninogenica Abundance
II 0.09835665 8.938072e-05
III 0.10542499 7.501672e-05
IV 0.01597953 7.218668e-05
not reported 0.10410170 7.665209e-05
OTUStreptococcus spp.:Abundance
II 0.002182096
III 0.002198521
IV 0.008665140
not reported 0.002304902
OTUCampylobacter concisus:Abundance
II 0.06971465
III 0.06958134
IV 0.07019612
not reported 0.06960806
OTUPrevotella melaninogenica:Abundance
II 0.002474919
III 0.002498719
IV 0.002664542
not reported 0.002482280
Residual Deviance: 4298.065
AIC: 4362.065
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 18.216999 -0.6012648 -0.5267628
III 12.444261 -0.3966945 -0.8034212
IV -11.989918 3.4986453 4.8063696
not reported 0.468482 -0.2550905 -1.0688802
OTUPrevotella melaninogenica Abundance
II -0.5463836 -0.35686917
III -0.2909522 0.07611376
IV 7.2884819 0.71174095
not reported -0.7134665 0.20856940
OTUStreptococcus spp.:Abundance
II 0.9437718
III 0.9312208
IV -1.1922259
not reported 0.8140533
OTUCampylobacter concisus:Abundance
II 0.5817365
III 0.7229133
IV 0.7093828
not reported 0.7652512
OTUPrevotella melaninogenica:Abundance
II 0.5307392
III 0.4434687
IV 0.4732175
not reported 0.6838384
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.0000000 0.5476636587 5.983583e-01
III 0.0000000 0.6915927458 4.217313e-01
IV 0.0000000 0.0004676283 1.536957e-06
not reported 0.6394399 0.7986532214 2.851236e-01
OTUPrevotella melaninogenica Abundance
II 5.848023e-01 0.7211898
III 7.710879e-01 0.9393286
IV 3.135270e-13 0.4766252
not reported 4.755571e-01 0.8347844
OTUStreptococcus spp.:Abundance
II 0.3452863
III 0.3517394
IV 0.2331727
not reported 0.4156144
OTUCampylobacter concisus:Abundance
II 0.5607442
III 0.4697331
IV 0.4780869
not reported 0.4441220
OTUPrevotella melaninogenica:Abundance
II 0.5955995
III 0.6574268
IV 0.6360580
not reported 0.4940772
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 3.4178746 0.9572477 0.9492585
III 2.4217194 0.9698807 0.9184955
IV 0.3242435 1.2844226 1.0802711
not reported 1.0379290 0.9795654 0.8949568
OTUPrevotella melaninogenica Abundance
II 0.9476780 0.9999681
III 0.9697920 1.0000057
IV 1.1235199 1.0000514
not reported 0.9284181 1.0000160
OTUStreptococcus spp.:Abundance
II 1.0020615
III 1.0020494
IV 0.9897224
not reported 1.0018781
OTUCampylobacter concisus:Abundance
II 1.041389
III 1.051588
IV 1.051057
not reported 1.054712
OTUPrevotella melaninogenica:Abundance
II 1.001314
III 1.001109
IV 1.001262
not reported 1.001699
# save fitted logits
pp <- fitted(fit)
# predit data
gmeans <- dat.rna.s %>% group_by(OTU) %>% summarise(M = mean(Abundance))
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = gmeans$M)
predict(fit, newdata = dotu, "probs") # bigger differences
I II III IV not reported
1 0.1229412 0.4051451 0.2996796 0.04227536 0.1299588
2 0.1211784 0.4131060 0.2967207 0.04097089 0.1280240
3 0.1019677 0.4181505 0.3033560 0.04764329 0.1288825
4 0.1175653 0.4131520 0.2963863 0.04655773 0.1263387
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:7], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
I II III IV not reported
0.007163659 0.120685482 0.417339347 0.058788398 0.396023113
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
I II III IV not reported
0.12215280 0.41418934 0.29624275 0.04012074 0.12729436
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
I II III IV not reported
0.09601775 0.42222526 0.29354022 0.04785669 0.14036007
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
I II III IV not reported
0.085178216 0.447730219 0.323528352 0.009125871 0.134437342
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
# in long format
table(dat.wgs$tumor.stage)
I II III IV not reported
14022 38950 21812 4674 28823
# by subject
dat <- dat.wgs %>% filter(OTU == "Fusobacterium nucleatum")
table(dat$tumor.stage)
I II III IV not reported
18 50 28 6 37
sum(table(dat$tumor.stage)) # sample size met
[1] 139
dat.wgs.s$Abundancej <- dat.wgs.s$Abundance+0.01
mean.dat <- dat.wgs.s %>%
group_by(tumor.stage, OTU) %>%
summarize(M = mean(Abundance))
`summarise()` has grouped output by 'tumor.stage'. You can override using the `.groups` argument.
ggplot(dat.wgs.s, aes(x=tumor.stage, y=Abundancej))+
geom_violin()+
geom_jitter(alpha=0.25,width = 0.25)+
geom_point(data=mean.dat, aes(x=tumor.stage, y = M), size=2, alpha =0.9, color="red")+
labs(x="Tumor Stage",
title="Distribution of abundance across tumor stage",
subtitle="Red dot is average abundnace")+
scale_y_continuous(
breaks=c(0, 100, 1000, 10000, 100000),
#limits = c(0,500),
trans="pseudo_log")+
facet_wrap(.~OTU, nrow=1)+
theme_classic()
fit <- nnet::multinom(tumor.stage ~ OTU, data=dat.wgs.s)
# weights: 25 (16 variable)
initial value 2013.406828
iter 10 value 1814.622781
final value 1805.470113
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU, data = dat.wgs.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 1.0215028 1.890488e-04 -4.336280e-05
III 0.4419184 -5.931869e-05 -7.276291e-05
IV -1.0997981 1.446667e-03 2.549813e-04
not reported 0.7205142 6.279291e-05 1.903398e-04
OTUPrevotella melaninogenica
II 3.565444e-04
III -1.755839e-04
IV 2.466118e-04
not reported -7.678646e-05
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.2748659 0.2968910 0.3887233
III 0.3020895 0.3262976 0.4272254
IV 0.4715916 0.5093426 0.6669046
not reported 0.2873596 0.3103864 0.4063847
OTUPrevotella melaninogenica
II 0.3887180
III 0.4272352
IV 0.6669143
not reported 0.4063988
Residual Deviance: 3610.94
AIC: 3642.94
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 3.716367 0.0006367617 -0.0001115518
III 1.462873 -0.0001817932 -0.0001703150
IV -2.332098 0.0028402622 0.0003823355
not reported 2.507361 0.0002023056 0.0004683735
OTUPrevotella melaninogenica
II 0.0009172313
III -0.0004109772
IV 0.0003697804
not reported -0.0001889436
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.0002021078 0.9994919 0.9999110
III 0.1435022029 0.9998549 0.9998641
IV 0.0196955171 0.9977338 0.9996949
not reported 0.0121636404 0.9998386 0.9996263
OTUPrevotella melaninogenica
II 0.9992682
III 0.9996721
IV 0.9997050
not reported 0.9998492
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 2.7773654 1.0001891 0.9999566
III 1.5556888 0.9999407 0.9999272
IV 0.3329383 1.0014477 1.0002550
not reported 2.0554899 1.0000628 1.0001904
OTUPrevotella melaninogenica
II 1.0003566
III 0.9998244
IV 1.0002466
not reported 0.9999232
# save fitted logits
pp <- fitted(fit)
# preditive data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"))
predict(fit, newdata = dotu, "probs") # only TINY differences
I II III IV not reported
1 0.1295088 0.3596933 0.2014754 0.04311844 0.2662041
2 0.1294913 0.3597127 0.2014362 0.04317503 0.2661848
3 0.1295047 0.3596664 0.2014544 0.04312809 0.2662464
4 0.1294980 0.3597917 0.2014233 0.04312550 0.2661615
## store the predicted probabilities for each value of ses
pp.otu <-cbind(dotu, predict(fit, newdata = dotu, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.otu[, 2:6], pp.otu$OTU, colMeans)
pp.otu$OTU: Campylobacter concisus
I II III IV not reported
0.12950474 0.35966639 0.20145442 0.04312809 0.26624637
------------------------------------------------------------
pp.otu$OTU: Fusobacterium nucleatum
I II III IV not reported
0.12950881 0.35969328 0.20147541 0.04311844 0.26620405
------------------------------------------------------------
pp.otu$OTU: Prevotella melaninogenica
I II III IV not reported
0.1294980 0.3597917 0.2014233 0.0431255 0.2661615
------------------------------------------------------------
pp.otu$OTU: Streptococcus spp.
I II III IV not reported
0.12949130 0.35971266 0.20143622 0.04317503 0.26618478
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance, data=dat.wgs.s)
# weights: 30 (20 variable)
initial value 2013.406828
iter 10 value 1850.225458
iter 20 value 1802.837098
iter 30 value 1802.117376
iter 30 value 1802.117364
iter 30 value 1802.117364
final value 1802.117364
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance, data = dat.wgs.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.9739095 0.02291961 0.03232894
III 0.3619615 0.05176408 0.06303580
IV -1.1743718 0.04799757 0.05885775
not reported 0.6568452 0.03692452 0.04742615
OTUPrevotella melaninogenica Abundance
II -0.01993434 0.0003775308
III -0.02946379 0.0004040709
IV -0.02812319 0.0004017778
not reported -0.02452024 0.0003940022
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.05465548 0.06411303 0.038956744
III 0.04574239 0.03721564 0.009434666
IV 0.04711372 0.03668676 0.003790582
not reported 0.04972819 0.05131864 0.027485605
OTUPrevotella melaninogenica Abundance
II 0.037144838 0.0003143339
III 0.008830521 0.0003143168
IV 0.003547078 0.0003151801
not reported 0.026197969 0.0003143136
Residual Deviance: 3604.235
AIC: 3644.235
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 17.819064 0.3574875 0.8298677
III 7.913043 1.3909226 6.6812964
IV -24.926323 1.3083078 15.5273637
not reported 13.208710 0.7195148 1.7254903
OTUPrevotella melaninogenica Abundance
II -0.5366652 1.201050
III -3.3365860 1.285553
IV -7.9285514 1.274756
not reported -0.9359598 1.253532
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.000000e+00 0.7207269 4.066136e-01
III 2.442491e-15 0.1642489 2.368372e-11
IV 0.000000e+00 0.1907689 0.000000e+00
not reported 0.000000e+00 0.4718238 8.443914e-02
OTUPrevotella melaninogenica Abundance
II 5.914989e-01 0.2297318
III 8.481420e-04 0.1985991
IV 2.220446e-15 0.2023955
not reported 3.492939e-01 0.2100121
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 2.6482776 1.023184 1.032857
III 1.4361436 1.053127 1.065065
IV 0.3090131 1.049168 1.060624
not reported 1.9286981 1.037615 1.048569
OTUPrevotella melaninogenica Abundance
II 0.9802630 1.000378
III 0.9709660 1.000404
IV 0.9722686 1.000402
not reported 0.9757779 1.000394
# save fitted logits
pp <- fitted(fit)
# predit data
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = mean(dat.wgs.s$Abundance))
predict(fit, newdata = dotu, "probs") # bigger differences
I II III IV not reported
1 0.1208789 0.3666360 0.2007297 0.04315515 0.2686003
2 0.1172000 0.3637192 0.2049603 0.04389903 0.2702215
3 0.1161438 0.3638489 0.2054156 0.04397845 0.2706133
4 0.1234424 0.3670217 0.1990351 0.04284823 0.2676526
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:7], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
I II III IV not reported
0.12072615 0.36244970 0.20403818 0.04369439 0.26909159
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
I II III IV not reported
0.12561799 0.36514482 0.19933957 0.04286691 0.26703071
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
I II III IV not reported
0.12826585 0.36548603 0.19763363 0.04255707 0.26605742
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
I II III IV not reported
0.12181779 0.36230325 0.20357630 0.04361342 0.26868925
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
fit <- nnet::multinom(tumor.stage ~ OTU + Abundance + OTU:Abundance, data=dat.wgs.s)
# weights: 45 (32 variable)
initial value 2013.406828
iter 10 value 1998.484917
iter 20 value 1834.136890
iter 30 value 1789.695310
iter 40 value 1787.562095
iter 50 value 1787.442956
iter 60 value 1787.436033
final value 1787.435678
converged
summary(fit)
Call:
nnet::multinom(formula = tumor.stage ~ OTU + Abundance + OTU:Abundance,
data = dat.wgs.s)
Coefficients:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.9892721 0.06616078 -0.03357802
III 0.3478732 0.04290083 -0.13258373
IV -1.2148038 0.08551966 -0.06154654
not reported 0.6642967 0.01745648 -0.03093131
OTUPrevotella melaninogenica Abundance
II -0.08794243 0.0002676666
III -0.01199882 0.0003284990
IV 0.03022360 0.0003357752
not reported -0.06568111 0.0003049560
OTUStreptococcus spp.:Abundance
II -1.176334e-03
III 1.906263e-04
IV 5.768491e-05
not reported 1.508244e-04
OTUCampylobacter concisus:Abundance
II 0.01878000
III 0.02269060
IV 0.02233686
not reported 0.02019126
OTUPrevotella melaninogenica:Abundance
II 0.0006269844
III 0.0005563821
IV 0.0005254570
not reported 0.0005905247
Std. Errors:
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.04024985 0.03108236 0.003934857
III 0.04579018 0.03560846 0.003454986
IV 0.04750239 0.03680560 0.003599811
not reported 0.04273090 0.03299592 0.003667227
OTUPrevotella melaninogenica Abundance
II 0.003141299 0.0006830277
III 0.003624091 0.0006823073
IV 0.003766989 0.0006827648
not reported 0.003332130 0.0006823819
OTUStreptococcus spp.:Abundance
II 0.0009092629
III 0.0007770727
IV 0.0008159857
not reported 0.0007769425
OTUCampylobacter concisus:Abundance
II 0.03773190
III 0.03771905
IV 0.03772803
not reported 0.03772863
OTUPrevotella melaninogenica:Abundance
II 0.001377664
III 0.001377451
IV 0.001381113
not reported 0.001377381
Residual Deviance: 3574.871
AIC: 3638.871
# obtain significance tests
z <- summary(fit)$coefficients/summary(fit)$standard.errors
z
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 24.578278 2.1285632 -8.533481
III 7.597113 1.2047933 -38.374611
IV -25.573531 2.3235505 -17.097158
not reported 15.546050 0.5290497 -8.434523
OTUPrevotella melaninogenica Abundance
II -27.995567 0.3918824
III -3.310851 0.4814532
IV 8.023277 0.4917875
not reported -19.711448 0.4468993
OTUStreptococcus spp.:Abundance
II -1.29372236
III 0.24531340
IV 0.07069353
not reported 0.19412555
OTUCampylobacter concisus:Abundance
II 0.4977220
III 0.6015687
IV 0.5920494
not reported 0.5351709
OTUPrevotella melaninogenica:Abundance
II 0.4551069
III 0.4039215
IV 0.3804590
not reported 0.4287301
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 0.000000e+00 0.03329042 0
III 3.019807e-14 0.22828311 0
IV 0.000000e+00 0.02014960 0
not reported 0.000000e+00 0.59677099 0
OTUPrevotella melaninogenica Abundance
II 0.000000e+00 0.6951451
III 9.301274e-04 0.6301945
IV 1.110223e-15 0.6228696
not reported 0.000000e+00 0.6549478
OTUStreptococcus spp.:Abundance
II 0.1957613
III 0.8062138
IV 0.9436417
not reported 0.8460776
OTUCampylobacter concisus:Abundance
II 0.6186800
III 0.5474613
IV 0.5538175
not reported 0.5925317
OTUPrevotella melaninogenica:Abundance
II 0.6490323
III 0.6862705
IV 0.7036047
not reported 0.6681197
## extract the coefficients from the model and exponentiate
exp(coef(fit)) # interpret as relative risks
(Intercept) OTUStreptococcus spp. OTUCampylobacter concisus
II 2.6892762 1.068398 0.9669795
III 1.4160526 1.043834 0.8758296
IV 0.2967682 1.089283 0.9403092
not reported 1.9431234 1.017610 0.9695422
OTUPrevotella melaninogenica Abundance
II 0.9158136 1.000268
III 0.9880729 1.000329
IV 1.0306850 1.000336
not reported 0.9364294 1.000305
OTUStreptococcus spp.:Abundance
II 0.9988244
III 1.0001906
IV 1.0000577
not reported 1.0001508
OTUCampylobacter concisus:Abundance
II 1.018957
III 1.022950
IV 1.022588
not reported 1.020396
OTUPrevotella melaninogenica:Abundance
II 1.000627
III 1.000557
IV 1.000526
not reported 1.000591
# save fitted logits
pp <- fitted(fit)
# predit data
gmeans <- dat.wgs.s %>% group_by(OTU) %>% summarise(M = mean(Abundance))
dotu <- data.frame(OTU = c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), Abundance = gmeans$M)
predict(fit, newdata = dotu, "probs") # bigger differences
I II III IV not reported
1 0.10246250 0.3687608 0.2074667 0.04382544 0.2774846
2 0.13192968 0.3418828 0.2068556 0.04459787 0.2747341
3 0.05553101 0.3790341 0.2210573 0.04887321 0.2955044
4 0.04526407 0.3947714 0.2211947 0.04676637 0.2920034
## look at the averaged predicted probabilities for different values of the continuous predictor variable Abundnace within each level of OTU
dabund <- data.frame(
OTU = rep(c("Fusobacterium nucleatum", "Streptococcus spp.", "Campylobacter concisus", "Prevotella melaninogenica"), each = 51),
Abundance = rep(seq(0, 500,10), 4)
)
pp.abund <-cbind(dabund, predict(fit, newdata = dabund, "probs", se = TRUE))
## calculate the mean probabilities within each level of OTU
by(pp.abund[, 3:7], pp.abund$OTU, colMeans)
pp.abund$OTU: Campylobacter concisus
I II III IV not reported
0.01609141 0.27664798 0.35704188 0.07270968 0.27750905
------------------------------------------------------------
pp.abund$OTU: Fusobacterium nucleatum
I II III IV not reported
0.1277517 0.3670714 0.1962556 0.0412055 0.2677158
------------------------------------------------------------
pp.abund$OTU: Prevotella melaninogenica
I II III IV not reported
0.11858942 0.36289653 0.20565457 0.04469163 0.26816784
------------------------------------------------------------
pp.abund$OTU: Streptococcus spp.
I II III IV not reported
0.13199751 0.30483885 0.22281492 0.04716287 0.29318585
## melt data set to long for ggplot2
lpp <- melt(pp.abund, id.vars = c("OTU", "Abundance"), value.name = "probability")
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(lpp, aes(x = Abundance, y = probability, colour = OTU)) +
geom_line() +
facet_grid(variable ~., scales="free")+
labs(y="Probability of Tumor Stage",
title="Tumor stage likelihood with bacteria abundance and OTU")+
theme(
panel.grid = element_blank()
)
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_1.1.1 dendextend_1.14.0 ggdendro_0.1.22 reshape2_1.4.4
[5] car_3.0-10 carData_3.0-4 gvlma_1.0.0.3 patchwork_1.1.1
[9] viridis_0.5.1 viridisLite_0.3.0 gridExtra_2.3 xtable_1.8-4
[13] kableExtra_1.3.1 data.table_1.13.6 readxl_1.3.1 forcats_0.5.0
[17] stringr_1.4.0 dplyr_1.0.3 purrr_0.3.4 readr_1.4.0
[21] tidyr_1.1.2 tibble_3.0.5 ggplot2_3.3.3 tidyverse_1.3.0
[25] lmerTest_3.1-3 lme4_1.1-26 Matrix_1.2-18 vegan_2.5-7
[29] lattice_0.20-41 permute_0.9-5 phyloseq_1.34.0 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] minqa_1.2.4 colorspace_2.0-0 rio_0.5.16
[4] ellipsis_0.3.1 rprojroot_2.0.2 XVector_0.30.0
[7] fs_1.5.0 rstudioapi_0.13 farver_2.0.3
[10] fansi_0.4.2 lubridate_1.7.9.2 xml2_1.3.2
[13] codetools_0.2-18 splines_4.0.3 knitr_1.30
[16] ade4_1.7-16 jsonlite_1.7.2 nloptr_1.2.2.2
[19] broom_0.7.3 cluster_2.1.0 dbplyr_2.0.0
[22] BiocManager_1.30.10 compiler_4.0.3 httr_1.4.2
[25] backports_1.2.0 assertthat_0.2.1 cli_2.2.0
[28] later_1.1.0.1 htmltools_0.5.1 prettyunits_1.1.1
[31] tools_4.0.3 igraph_1.2.6 gtable_0.3.0
[34] glue_1.4.2 Rcpp_1.0.6 Biobase_2.50.0
[37] cellranger_1.1.0 vctrs_0.3.6 Biostrings_2.58.0
[40] rhdf5filters_1.2.0 multtest_2.46.0 ape_5.4-1
[43] nlme_3.1-151 iterators_1.0.13 xfun_0.20
[46] ps_1.5.0 openxlsx_4.2.3 rvest_0.3.6
[49] lifecycle_0.2.0 statmod_1.4.35 zlibbioc_1.36.0
[52] MASS_7.3-53 scales_1.1.1 hms_1.0.0
[55] promises_1.1.1 parallel_4.0.3 biomformat_1.18.0
[58] rhdf5_2.34.0 curl_4.3 yaml_2.2.1
[61] stringi_1.5.3 S4Vectors_0.28.1 foreach_1.5.1
[64] BiocGenerics_0.36.0 zip_2.1.1 boot_1.3-25
[67] rlang_0.4.10 pkgconfig_2.0.3 evaluate_0.14
[70] Rhdf5lib_1.12.0 labeling_0.4.2 tidyselect_1.1.0
[73] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
[76] IRanges_2.24.1 generics_0.1.0 DBI_1.1.1
[79] foreign_0.8-81 pillar_1.4.7 haven_2.3.1
[82] whisker_0.4 withr_2.4.0 mgcv_1.8-33
[85] nnet_7.3-14 abind_1.4-5 survival_3.2-7
[88] modelr_0.1.8 crayon_1.3.4 rmarkdown_2.6
[91] progress_1.2.2 grid_4.0.3 git2r_0.28.0
[94] reprex_0.3.0 digest_0.6.27 webshot_0.5.2
[97] httpuv_1.5.5 numDeriv_2016.8-1.1 stats4_4.0.3
[100] munsell_0.5.0