Question 3

Update as of 2021-03-04

NCI 16s data

Double Checking Data
Ordered Logistic Regression

Model 0: TS ~ 1
Model 1: TS ~ OTU
Model 2: TS ~ OTU + COVARIATES
Proportional Odds Assumption

TCGA RNAseq data

Double Checking Data
Ordered Logistic Regression

Model 0: TS ~ 1
Model 1: TS ~ OTU
Model 2: TS ~ OTU + COVARIATES
Proportional Odds Assumption

Last updated: 2021-03-05

Checks: 6 1

Knit directory: esoph-micro-cancer-workflow/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200916)

The command set.seed(20200916) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 092bf33

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 092bf33. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/figure/
    Ignored:    data/

Unstaged changes:
    Modified:   analysis/ordered-logit-model.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ordered-logit-model.Rmd) and HTML (docs/ordered-logit-model.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	cb97ca6	noah-padgett	2021-02-25	updated figures violin
Rmd	fe971b9	noah-padgett	2021-02-13	violin plot scale fixed
html	fe971b9	noah-padgett	2021-02-13	violin plot scale fixed

Question 3

Q3: Is fuso associated with tumor stage (pTNM) in either data set? Does X bacteria predict stage? Multivariable w/ age, sex, BMI, history of Barrett's

Add to this analysis:

Fusobacterium nucleatum
Streptococcus sanguinis
Campylobacter concisus
Prevotella spp.

TCGA drop “not reported” from tumor stage.

Update as of 2021-03-04

The goal of this analysis is still to answer the question posed above. However, the methods by which this is done has been expanded upon and the model has been built to be as interpretable as possible. In the results, a deeper explanation of the results is given to help explain the effects of adding covariates and how to interpret the parameters. In general, the model is set up to model how the probability of being in higher tumor stage depends on the individual characteristics of the study participants. Let $TS$ represents the tumor stage with levels $j = 1, 2, ..., J$ (TS have levels 0, 1, I, II, III, and IV). We aim to model the odds of being less than or equal to a particular category $\frac{Pr(TS\leq j)}{Pr(TS> j)}, j = 1, 2, ..., J-1$ We aim to model the logit, which is $log\left(\frac{Pr(TS\leq j)}{Pr(TS> j)}\right)=logit\left(Pr(TS\leq j)\right)=\beta_{0j}-\left(\eta_1x_1+\eta_2x_2+...+\eta_px_p\right)$ where the terms on the right represent how R parameterizes the ordered logit model with $\beta_{0j}$ being the intercept/threshold parameter separating category $j$ from $j+1$ . The $\eta_p$ regression weights represent the change in logits. A major assumption of the ordered logit model, as parameterized here, is the parallel lines assumption. This means the even though the intercepts differ separate the categories, the regression weights are equal for each successive category.

NCI 16s data

Double Checking Data

# in long format
table(dat.16s$tumor.stage)


    0     1     I    II   III    IV 
11088   264  6336 13464  6072  2640

# by subject
dat <- dat.16s %>% filter(OTU == "Fusobacterium_nucleatum")
table(dat$tumor.stage)


  0   1   I  II III  IV 
 42   1  24  51  23  10

sum(table(dat$tumor.stage)) # sample size match

[1] 151

mean.dat <- dat.16s.s %>%
  group_by(tumor.stage, OTU) %>%
  summarize(M = mean(Abundance))

`summarise()` has grouped output by 'tumor.stage'. You can override using the `.groups` argument.

ggplot(dat.16s.s, aes(x=tumor.stage, y=Abundance))+
  geom_violin()+
  geom_jitter(alpha=0.25,width = 0.25)+
  geom_point(data=mean.dat, aes(x=tumor.stage, y = M), size=2, alpha =0.9, color="red")+
  labs(x="Tumor Stage", y="Percent Relative Abundance",
       title="Distribution of abundance across tumor stage",
       subtitle="Red dot is average percent relative abundance")+
  scale_y_continuous(trans="pseudo_log")+
  facet_wrap(.~OTU, nrow=1, scales="free")+
  theme_classic()

Stage “1” has only 1 unique sample and will be dropped from subsequent analyses. And remove NA values (7).

dat.16s.s <- dat.16s.s %>%
  filter(tumor.stage != "1")%>%
  mutate(tumor.stage = droplevels(tumor.stage, exclude=c("1",NA)))

Now, transform data into a useful wide-format representation to avoid inflating sample size. The long-format above is very useful for making plots, but less useful for the ordered logistic regression to follow.

dat.16s.wide <- dat.16s.s %>%
  dplyr::select(OTU, Abundance, Sample.ID, BarrettsHist, bar.c, female, female.c, age.c,BMI.n, bmi.c, sample_type, tumor.cat, tumor.stage) %>%
  pivot_wider(names_from = OTU, values_from = Abundance)

Ordered Logistic Regression

Model 0: TS ~ 1

This in the intercept only model. Implying that we will only estimate the unique intercepts for the $J-1$ categories.

## fit ordered logit model and store results 'm'
# use only 1 bacteria
fit <- fit0 <- MASS::polr(tumor.stage ~ 1, data = filter(dat.16s.s, OTU =="Fusobacterium nucleatum"), Hess=TRUE)
## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1, data = filter(dat.16s.s, 
    OTU == "Fusobacterium nucleatum"), Hess = TRUE)

No coefficients

Intercepts:
       Value  Std. Error t value
0|I    -0.945  0.182     -5.194 
I|II   -0.241  0.164     -1.467 
II|III  1.266  0.197      6.421 
III|IV  2.639  0.327      8.062 

Residual Deviance: 445.35 
AIC: 453.35

# save fitted probabilities
# these represent the estimated probability of each tumor stage
pp <- fitted(fit)
pp[1,]

      0       I      II     III      IV 
0.27997 0.15998 0.34002 0.15335 0.06668

The model can be more fully described as $\begin{align*} logit\left(Pr(TS\leq 0)\right)&=-0.95\\ logit\left(Pr(TS\leq I)\right)&=-0.24\\ logit\left(Pr(TS\leq II)\right)&=1.27\\ logit\left(Pr(TS\leq III)\right)&=2.64\\ \end{align*}$

Interpreting the intercepts only model is relatively straightforward. The main interpretation we can obtain from this base model is the probability of each category. We obtain these probabilities by applying the inverse-logit transformation to obtain each of the cumulative probabilities and then use the relevant probabilities to compute each individual category probability. That is, $\begin{align*} Pr(TS = 0)&= Pr(TS\leq 0) = 0.28\\ Pr(TS = I)&=Pr(TS\leq I) - Pr(TS\leq 0)= 0.44-0.28 =0.16\\ Pr(TS = II)&=Pr(TS\leq II) - Pr(TS\leq I) = 0.78 - 0.44 = 0.34 \\ Pr(TS = III)&=Pr(TS\leq III) - Pr(TS\leq II) =0.93-0.78=0.15\\ Pr(TS = IV)&=1 - Pr(TS\leq IV) =1-0.93=0.07,\\ \end{align*}$ which, in all is not very informative. More information is definately gained as predictors/covariates are introduced.

Model 1: TS ~ OTU

This model expands on the previous model to include OTU abundance as a predictor of TS. OTU abundance is captured by four variables as predictors of TS (the four OTUs listed at the top of this document).

dat0 <- dat.16s.wide
## fit ordered logit model and store results 'm'
fit <- fit1 <- MASS::polr(tumor.stage ~ 1+ `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`, data = dat.16s.wide, Hess=TRUE)

## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1 + `Streptococcus spp.` + 
    `Fusobacterium nucleatum` + `Prevotella melaninogenica` + 
    `Campylobacter concisus`, data = dat.16s.wide, Hess = TRUE)

Coefficients:
                                Value Std. Error t value
`Streptococcus spp.`         0.000488    0.00577  0.0846
`Fusobacterium nucleatum`    0.013240    0.01245  1.0639
`Prevotella melaninogenica`  0.015590    0.01532  1.0178
`Campylobacter concisus`    -0.037641    0.07027 -0.5357

Intercepts:
       Value  Std. Error t value
0|I    -0.821  0.274     -3.001 
I|II   -0.106  0.267     -0.397 
II|III  1.424  0.290      4.907 
III|IV  2.803  0.390      7.186 

Residual Deviance: 442.49 
AIC: 458.49

anova(fit1, fit0) # Chi-square difference test

Likelihood ratio tests of ordinal regression models

Response: tumor.stage
                                                                                                          Model
1                                                                                                             1
2 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1       146      445.3                              
2       142      442.5 1 vs 2     4    2.864  0.5809

# obtain approximate p-values
ctable <- coef(summary(fit))
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))

                                 Value Std. Error  t value   p value
`Streptococcus spp.`         0.0004884   0.005772  0.08461 9.326e-01
`Fusobacterium nucleatum`    0.0132397   0.012445  1.06385 2.874e-01
`Prevotella melaninogenica`  0.0155899   0.015317  1.01780 3.088e-01
`Campylobacter concisus`    -0.0376410   0.070265 -0.53570 5.922e-01
0|I                         -0.8207942   0.273501 -3.00106 2.690e-03
I|II                        -0.1057784   0.266764 -0.39652 6.917e-01
II|III                       1.4243029   0.290278  4.90669 9.263e-07
III|IV                       2.8030741   0.390066  7.18615 6.664e-13

# obtain CIs
(ci <- confint(fit))  # CIs assuming normality

Waiting for profiling to be done...

                               2.5 %  97.5 %
`Streptococcus spp.`        -0.01089 0.01180
`Fusobacterium nucleatum`   -0.01124 0.03852
`Prevotella melaninogenica` -0.01508 0.04568
`Campylobacter concisus`    -0.19516 0.09332

## OR and CI
exp(cbind(OR = coef(fit), ci))

                                OR  2.5 % 97.5 %
`Streptococcus spp.`        1.0005 0.9892  1.012
`Fusobacterium nucleatum`   1.0133 0.9888  1.039
`Prevotella melaninogenica` 1.0157 0.9850  1.047
`Campylobacter concisus`    0.9631 0.8227  1.098

# save fitted logits
pp <- fitted(fit)
# predictive data
dat0 <- cbind(dat0, predict(fit, newdata = dat.16s.wide, "probs"))

## melt data set to long for ggplot2
dat1 <- dat0 %>%
  pivot_longer(
    cols=`0`:`IV`,
    names_to = "TS",
    values_to = "Pred.Prob"
  ) %>%
  pivot_longer(
    cols=`Streptococcus spp.`:`Campylobacter concisus`,
    names_to="OTU",
    values_to="Abundance"
  )
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(dat1, aes(x = Abundance, y = Pred.Prob, color=TS, group=TS, linetype=TS)) +
  geom_line() + 
  facet_grid(OTU ~., scales="free")+
  labs(y="Probability of Tumor Stage",
       x="Relative Abundance (%)",
       title="Tumor stage likelihood with bacteria abundance",
       color="Tumor Stage", linetype="Tumor Stage")+
  theme(
    panel.grid = element_blank()
  )

The results from the model when the abundance of the four OTUs are included in the model becomes a bit more complicated. First of all, the model with OTU abundances did not fit better in terms of AIC, BIC, and no difference was found from a chi-square difference test.

For interpretation purposes, the following is how to interpret the effects of OTU abundance on predicting tumor stage. First, the logit model is $\begin{align*} logit\left(Pr(TS\leq 0)\right)&=-0.95\\ logit\left(Pr(TS\leq I)\right)&=-0.24\\ logit\left(Pr(TS\leq II)\right)&=1.27\\ logit\left(Pr(TS\leq III)\right)&=2.64\\ \end{align*}$ While interpreting the odds ratios can be done as follows. First, take Streptococcus spp., the OR was 1.00 95% CI (0.99, 1.01). This implies that for individuals who had a 1% higher relative abundance of Streptococcus spp., the odds of being in a higher tumor stage is multiplied 1.00 times (i.e., 0% increase on average) holding the abundance of other OTUs constant. This means that the knowledge of relative abundance of Strepto was not informative over the base probability we estimated in model 0.

For Fusobacterium nucleatum (OR = 1.01, 95% CI [0.99, 1.04]) and Prevotella melaninogenica (OR = 1.01, 95% CI [0.99, 1.05]), the interpretation is as follows because the OR is the same. For individuals who had a 1% higher relative abundance of this OTU (Fuso. or Prevo.), the odds of being in a higher tumor stage is multiplied 1.01 times (i.e., about a 1% increase on average) holding the abundance of other OTUs constant.

Model 2: TS ~ OTU + COVARIATES

dat0 <- dat.16s.wide
## fit ordered logit model and store results 'm'
fit <- fit2 <- MASS::polr(tumor.stage ~ 1+ `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c, data = dat.16s.wide, Hess=TRUE)

## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1 + `Streptococcus spp.` + 
    `Fusobacterium nucleatum` + `Prevotella melaninogenica` + 
    `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c, 
    data = dat.16s.wide, Hess = TRUE)

Coefficients:
                               Value Std. Error t value
`Streptococcus spp.`         0.00250    0.00589  0.4240
`Fusobacterium nucleatum`    0.01447    0.01264  1.1453
`Prevotella melaninogenica`  0.01515    0.01595  0.9501
`Campylobacter concisus`    -0.03350    0.07537 -0.4444
age.c                       -0.00415    0.01371 -0.3025
female.c                    -0.01082    0.43529 -0.0248
bmi.c                       -0.01602    0.02275 -0.7043
bar.c                        0.62262    0.34717  1.7934

Intercepts:
       Value  Std. Error t value
0|I    -0.776  0.276     -2.812 
I|II   -0.051  0.270     -0.187 
II|III  1.500  0.295      5.080 
III|IV  2.892  0.394      7.334 

Residual Deviance: 438.63 
AIC: 462.63

anova(fit2, fit0) # Chi-square difference test

Likelihood ratio tests of ordinal regression models

Response: tumor.stage
                                                                                                                                             Model
1                                                                                                                                                1
2 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1       146      445.3                              
2       138      438.6 1 vs 2     8    6.715  0.5677

# obtain approximate p-values
ctable <- coef(summary(fit))
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))

                                Value Std. Error  t value   p value
`Streptococcus spp.`         0.002499   0.005893  0.42397 6.716e-01
`Fusobacterium nucleatum`    0.014473   0.012636  1.14533 2.521e-01
`Prevotella melaninogenica`  0.015152   0.015948  0.95008 3.421e-01
`Campylobacter concisus`    -0.033496   0.075370 -0.44442 6.567e-01
age.c                       -0.004149   0.013715 -0.30249 7.623e-01
female.c                    -0.010816   0.435289 -0.02485 9.802e-01
bmi.c                       -0.016024   0.022751 -0.70430 4.812e-01
bar.c                        0.622623   0.347169  1.79343 7.290e-02
0|I                         -0.775974   0.275914 -2.81238 4.918e-03
I|II                        -0.050542   0.270417 -0.18691 8.517e-01
II|III                       1.500106   0.295276  5.08035 3.767e-07
III|IV                       2.892116   0.394360  7.33369 2.239e-13

# obtain CIs
(ci <- confint(fit))  # CIs assuming normality

Waiting for profiling to be done...

                                2.5 %  97.5 %
`Streptococcus spp.`        -0.009111 0.01405
`Fusobacterium nucleatum`   -0.010312 0.04018
`Prevotella melaninogenica` -0.016973 0.04624
`Campylobacter concisus`    -0.199242 0.10805
age.c                       -0.031171 0.02275
female.c                    -0.873271 0.83890
bmi.c                       -0.061385 0.02842
bar.c                       -0.056145 1.30794

## OR and CI
exp(cbind(OR = coef(fit), ci))

                                OR  2.5 % 97.5 %
`Streptococcus spp.`        1.0025 0.9909  1.014
`Fusobacterium nucleatum`   1.0146 0.9897  1.041
`Prevotella melaninogenica` 1.0153 0.9832  1.047
`Campylobacter concisus`    0.9671 0.8194  1.114
age.c                       0.9959 0.9693  1.023
female.c                    0.9892 0.4176  2.314
bmi.c                       0.9841 0.9405  1.029
bar.c                       1.8638 0.9454  3.699

# save fitted logits
pp <- fitted(fit)
# predictive data
dat0 <- cbind(dat0, predict(fit, newdata = dat.16s.wide, "probs"))

## melt data set to long for ggplot2
dat1 <- dat0 %>%
  pivot_longer(
    cols=`0`:`IV`,
    names_to = "TS",
    values_to = "Pred.Prob"
  ) %>%
  pivot_longer(
    cols=`Streptococcus spp.`:`Campylobacter concisus`,
    names_to="OTU",
    values_to="Abundance"
  )
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(dat1, aes(x = Abundance, y = Pred.Prob, color=TS, group=TS, linetype=TS)) +
  geom_line() + 
  facet_grid(OTU ~., scales="free")+
  labs(y="Probability of Tumor Stage",
       x="Relative Abundance (%)",
       title="Tumor stage likelihood with bacteria abundance",
       color="Tumor Stage", linetype="Tumor Stage")+
  theme(
    panel.grid = element_blank()
  )

No significant changes. Everything is still insignificant. For interpretation purposes, the results in from this model can be interpreted the same as the previous because all covariates were mean-centered.

Proportional Odds Assumption

dat.16s.wide <- dat.16s.wide %>%
  mutate(
    T0 = I(as.numeric(tumor.stage) >= 1),
    TI = I(as.numeric(tumor.stage) >= 2),
    TII = I(as.numeric(tumor.stage) >= 3),
    TIII = I(as.numeric(tumor.stage) >= 4),
    TIV = I(as.numeric(tumor.stage) >= 5),
  )
apply(dat.16s.wide[,c('TI','TII','TIII', 'TIV')],2, table)

       TI TII TIII TIV
FALSE  42  66  117 140
TRUE  108  84   33  10

summary(glm(TI  ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.16s.wide, family=binomial(link="logit")))


Call:
glm(formula = TI ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.16s.wide)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.001  -1.186   0.624   0.828   1.259  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)  
(Intercept)                  0.49816    0.30842    1.62    0.106  
`Streptococcus spp.`         0.01025    0.00768    1.34    0.182  
`Fusobacterium nucleatum`    0.06398    0.04394    1.46    0.145  
`Prevotella melaninogenica`  0.01499    0.02613    0.57    0.566  
`Campylobacter concisus`     0.00351    0.11131    0.03    0.975  
age.c                        0.00969    0.01783    0.54    0.587  
female.c                    -0.16395    0.51003   -0.32    0.748  
bmi.c                        0.01611    0.03096    0.52    0.603  
bar.c                        0.88774    0.44663    1.99    0.047 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 177.89  on 149  degrees of freedom
Residual deviance: 163.73  on 141  degrees of freedom
AIC: 181.7

Number of Fisher Scoring iterations: 6

summary(glm(TII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.16s.wide, family=binomial(link="logit")))


Call:
glm(formula = TII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.16s.wide)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.489  -1.212   0.704   1.108   1.499  

Coefficients:
                             Estimate Std. Error z value Pr(>|z|)  
(Intercept)                  0.000384   0.283602    0.00    0.999  
`Streptococcus spp.`         0.003275   0.006798    0.48    0.630  
`Fusobacterium nucleatum`    0.073311   0.038078    1.93    0.054 .
`Prevotella melaninogenica` -0.001872   0.020207   -0.09    0.926  
`Campylobacter concisus`    -0.059713   0.088063   -0.68    0.498  
age.c                       -0.014604   0.015986   -0.91    0.361  
female.c                     0.402529   0.486388    0.83    0.408  
bmi.c                        0.006519   0.026552    0.25    0.806  
bar.c                        0.486287   0.399061    1.22    0.223  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 205.78  on 149  degrees of freedom
Residual deviance: 195.34  on 141  degrees of freedom
AIC: 213.3

Number of Fisher Scoring iterations: 5

summary(glm(TIII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.16s.wide, family=binomial(link="logit")))


Call:
glm(formula = TIII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.16s.wide)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.347  -0.756  -0.576  -0.202   2.107  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 -1.26111    0.36325   -3.47  0.00052 ***
`Streptococcus spp.`        -0.00693    0.00900   -0.77  0.44116    
`Fusobacterium nucleatum`   -0.01641    0.02223   -0.74  0.46031    
`Prevotella melaninogenica`  0.03267    0.02188    1.49  0.13532    
`Campylobacter concisus`    -0.40193    0.36903   -1.09  0.27608    
age.c                        0.00600    0.01879    0.32  0.74944    
female.c                    -0.32657    0.63515   -0.51  0.60714    
bmi.c                       -0.10841    0.04514   -2.40  0.01632 *  
bar.c                        0.80964    0.49326    1.64  0.10071    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 158.07  on 149  degrees of freedom
Residual deviance: 144.20  on 141  degrees of freedom
AIC: 162.2

Number of Fisher Scoring iterations: 7

summary(glm(TIV  ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.16s.wide, family=binomial(link="logit")))

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred


Call:
glm(formula = TIV ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.16s.wide)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0213  -0.3107  -0.1561  -0.0027   2.2971  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 -3.53618    0.87137   -4.06  4.9e-05 ***
`Streptococcus spp.`        -0.00368    0.01603   -0.23   0.8184    
`Fusobacterium nucleatum`   -2.65756    2.07129   -1.28   0.1995    
`Prevotella melaninogenica`  0.15010    0.05747    2.61   0.0090 ** 
`Campylobacter concisus`     0.42004    0.33434    1.26   0.2090    
age.c                       -0.01818    0.03911   -0.46   0.6420    
female.c                    -1.50786    1.16883   -1.29   0.1970    
bmi.c                       -0.28740    0.10923   -2.63   0.0085 ** 
bar.c                        0.16138    0.94158    0.17   0.8639    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 73.479  on 149  degrees of freedom
Residual deviance: 47.240  on 141  degrees of freedom
AIC: 65.24

Number of Fisher Scoring iterations: 13

TCGA RNAseq data

Double Checking Data

# in long format
table(dat.rna.s$tumor.stage)


  I  II III  IV 
 84 284 204  32

# by subject
dat <- dat.rna.s %>% filter(OTU == "Fusobacterium nucleatum")
table(dat$tumor.stage)


  I  II III  IV 
 21  71  51   8

sum(table(dat$tumor.stage))

[1] 151

nrow(dat) # matches but there's missing data

[1] 173

sum(is.na(dat$Abundance))

[1] 107

# number of non-missing abundnance data
nrow(dat) - sum(is.na(dat$Abundance))

[1] 66

# matches 
dat0 <- dat %>%
  filter()
table(dat0$tumor.stage)


  I  II III  IV 
 21  71  51   8

# plot functions
#root function
root<-function(x){
  x <- ifelse(x < 0, 0, x)
  x**(0.2)
}
#inverse root function
invroot<-function(x){
  x**(5)
}

mean.dat <- dat.rna.s %>%
  group_by(tumor.stage, OTU) %>%
  summarize(M = mean(Abundance, na.rm=T))

`summarise()` has grouped output by 'tumor.stage'. You can override using the `.groups` argument.

ggplot(dat.rna.s, aes(x=tumor.stage, y=Abundance))+
  geom_violin()+
  geom_jitter(alpha=0.25,width = 0.25)+
  geom_point(data=mean.dat, aes(x=tumor.stage, y = M), size=2, alpha =0.9, color="red")+
  labs(x="Tumor Stage", y="Percent Relative Abundance",
       title="Distribution of abundance across tumor stage",
       subtitle="Red dot is average percent relative abundance")+
  scale_y_continuous(
    trans=scales::trans_new("root", root, invroot),
    breaks=c(0, 0.001,0.01, 0.1, 1,10,50),
    labels = c(0, 0.001,0.01, 0.1, 1,10,50),
    limits = c(0, 50)
  )+
  facet_wrap(.~OTU, nrow=1, scales="free")+
  theme_classic()

Warning: Removed 428 rows containing non-finite values (stat_ydensity).

Warning: Removed 461 rows containing missing values (geom_point).

Stage IV only has two cases, remove for analysis.

dat.rna.s <- dat.rna.s %>%
  filter(is.na(Abundance) == F, tumor.stage!="IV")%>%
  mutate(
    tumor.stage = factor(
      tumor.stage,
      levels = c("I", "II", "III"), ordered=T
    )
  )

dat.rna.wide <- dat.rna.s %>%
  dplyr::select(OTU, Abundance, ID, BarrettsHist, bar.c, female.c, age.c, bmi.c, tumor.stage) %>%
  
  pivot_wider(names_from = OTU, values_from = Abundance)
dat.rna.wide$age.c[is.na(dat.rna.wide$age.c)]<-0

Ordered Logistic Regression

Model 0: TS ~ 1

This in the intercept only model. Implying that we will only estimate the unique intercepts for the $J-1$ categories.

## fit ordered logit model and store results 'm'
fit <- fit0 <- MASS::polr(tumor.stage ~ 1, data = dat.rna.wide, Hess=TRUE)
## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1, data = dat.rna.wide, Hess = TRUE)

No coefficients

Intercepts:
       Value  Std. Error t value
I|II   -1.276  0.326     -3.909 
II|III  0.804  0.292      2.757 

Residual Deviance: 115.42 
AIC: 119.42

# save fitted probabilities
# these represent the estimated probability of each tumor stage
pp <- fitted(fit)
pp[1,]

     I     II    III 
0.2182 0.4727 0.3091

Model 1: TS ~ OTU

dat0 <- dat.rna.wide
## fit ordered logit model and store results 'm'
fit <- fit1 <- MASS::polr(tumor.stage ~ 1+ `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`, data = dat.rna.wide, Hess=TRUE)

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1 + `Streptococcus spp.` + 
    `Fusobacterium nucleatum` + `Prevotella melaninogenica` + 
    `Campylobacter concisus`, data = dat.rna.wide, Hess = TRUE)

Coefficients:
                              Value Std. Error t value
`Streptococcus spp.`         2.7963      4.913   0.569
`Fusobacterium nucleatum`   -0.0525      0.159  -0.330
`Prevotella melaninogenica` -0.6643      1.628  -0.408
`Campylobacter concisus`     7.2512     15.257   0.475

Intercepts:
       Value  Std. Error t value
I|II   -1.210  0.367     -3.296 
II|III  0.884  0.345      2.565 

Residual Deviance: 114.53 
AIC: 126.53

anova(fit1, fit0) # Chi-square difference test

Likelihood ratio tests of ordinal regression models

Response: tumor.stage
                                                                                                          Model
1                                                                                                             1
2 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1        53      115.4                              
2        49      114.5 1 vs 2     4   0.8931  0.9255

# obtain approximate p-values
ctable <- coef(summary(fit))
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))

                               Value Std. Error t value   p value
`Streptococcus spp.`         2.79631     4.9131  0.5692 0.5692510
`Fusobacterium nucleatum`   -0.05254     0.1594 -0.3296 0.7416751
`Prevotella melaninogenica` -0.66427     1.6278 -0.4081 0.6832171
`Campylobacter concisus`     7.25117    15.2571  0.4753 0.6345980
I|II                        -1.21043     0.3673 -3.2957 0.0009819
II|III                       0.88364     0.3445  2.5646 0.0103282

# obtain CIs
(ci <- confint(fit))  # CIs assuming normality

Waiting for profiling to be done...

                               2.5 %  97.5 %
`Streptococcus spp.`         -7.0912 13.0226
`Fusobacterium nucleatum`    -0.3907  0.2666
`Prevotella melaninogenica`  -4.6019  2.3536
`Campylobacter concisus`    -19.8730 43.8222

## OR and CI
exp(cbind(OR = coef(fit), ci))

                                   OR     2.5 %    97.5 %
`Streptococcus spp.`          16.3841 8.324e-04 4.525e+05
`Fusobacterium nucleatum`      0.9488 6.766e-01 1.306e+00
`Prevotella melaninogenica`    0.5146 1.003e-02 1.052e+01
`Campylobacter concisus`    1409.7570 2.340e-09 1.076e+19

# predictive data
pp <- as.data.frame(predict(fit, newdata = dat.rna.wide, "probs"))
dat0 <- cbind(dat0, pp)

## melt data set to long for ggplot2
dat1 <- dat0 %>%
  pivot_longer(
    cols=`I`:`III`,
    names_to = "TS",
    values_to = "Pred.Prob"
  ) %>%
  pivot_longer(
    cols=`Fusobacterium nucleatum`:`Campylobacter concisus`,
    names_to="OTU",
    values_to="Abundance"
  )
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(dat1, aes(x = Abundance, y = Pred.Prob, color=TS, group=TS, linetype=TS)) +
  geom_line() + 
  facet_grid(OTU ~., scales="free")+
  labs(y="Probability of Tumor Stage",
       x="Relative Abundance (%)",
       title="Tumor stage likelihood with bacteria abundance",
       color="Tumor Stage", linetype="Tumor Stage")+
  scale_x_continuous(
    trans=scales::trans_new("root", root, invroot),
    breaks=c(0, 0.001,0.01, 0.1, 1,10,25),
    labels = c(0, 0.001,0.01, 0.1, 1,10,25),
    limits = c(0, 25)
  )+
  theme(
    panel.grid = element_blank()
  )

Model 2: TS ~ OTU + COVARIATES

dat0 <- dat.rna.wide
## fit ordered logit model and store results 'm'
fit <- fit2 <- MASS::polr(tumor.stage ~ 1+ `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c, data = dat.rna.wide, Hess=TRUE)

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## view a summary of the model
summary(fit)

Call:
MASS::polr(formula = tumor.stage ~ 1 + `Streptococcus spp.` + 
    `Fusobacterium nucleatum` + `Prevotella melaninogenica` + 
    `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c, 
    data = dat.rna.wide, Hess = TRUE)

Coefficients:
                              Value Std. Error t value
`Streptococcus spp.`         0.8852     5.2418   0.169
`Fusobacterium nucleatum`   -0.1576     0.1786  -0.882
`Prevotella melaninogenica`  1.0219     1.9421   0.526
`Campylobacter concisus`    -6.2926    17.9404  -0.351
age.c                       -0.0150     0.0233  -0.645
female.c                    -2.4460     0.8541  -2.864
bmi.c                        0.0144     0.0456   0.315
bar.c                       -2.0994     1.1047  -1.900

Intercepts:
       Value  Std. Error t value
I|II   -1.644  0.451     -3.645 
II|III  1.086  0.374      2.900 

Residual Deviance: 97.52 
AIC: 117.52

anova(fit2, fit0) # Chi-square difference test

Likelihood ratio tests of ordinal regression models

Response: tumor.stage
                                                                                                                                             Model
1                                                                                                                                                1
2 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus` + age.c + female.c + bmi.c + bar.c
  Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
1        53     115.42                              
2        45      97.52 1 vs 2     8     17.9   0.022

# obtain approximate p-values
ctable <- coef(summary(fit))
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))

                               Value Std. Error t value   p value
`Streptococcus spp.`         0.88522    5.24183  0.1689 0.8658937
`Fusobacterium nucleatum`   -0.15762    0.17864 -0.8823 0.3776030
`Prevotella melaninogenica`  1.02186    1.94212  0.5262 0.5987809
`Campylobacter concisus`    -6.29262   17.94037 -0.3508 0.7257744
age.c                       -0.01502    0.02329 -0.6451 0.5188823
female.c                    -2.44604    0.85409 -2.8639 0.0041846
bmi.c                        0.01437    0.04562  0.3150 0.7527811
bar.c                       -2.09943    1.10468 -1.9005 0.0573691
I|II                        -1.64361    0.45090 -3.6452 0.0002672
II|III                       1.08557    0.37430  2.9003 0.0037283

# obtain CIs
(ci <- confint(fit))  # CIs assuming normality

Waiting for profiling to be done...

                                2.5 %   97.5 %
`Streptococcus spp.`         -9.44506 11.77218
`Fusobacterium nucleatum`    -0.51454  0.20082
`Prevotella melaninogenica`  -3.09305  4.88336
`Campylobacter concisus`    -41.54235 31.43536
age.c                        -0.06142  0.03073
female.c                     -4.24838 -0.84159
bmi.c                        -0.07663  0.10587
bar.c                        -4.39018  0.11094

## OR and CI
exp(cbind(OR = coef(fit), ci))

                                 OR     2.5 %    97.5 %
`Streptococcus spp.`        2.42352 7.908e-05 1.296e+05
`Fusobacterium nucleatum`   0.85418 5.978e-01 1.222e+00
`Prevotella melaninogenica` 2.77834 4.536e-02 1.321e+02
`Campylobacter concisus`    0.00185 9.086e-19 4.490e+13
age.c                       0.98509 9.404e-01 1.031e+00
female.c                    0.08664 1.429e-02 4.310e-01
bmi.c                       1.01447 9.262e-01 1.112e+00
bar.c                       0.12253 1.240e-02 1.117e+00

# predictive data
pp <- as.data.frame(predict(fit, newdata = dat.rna.wide, "probs"))
dat0 <- cbind(dat0, pp)

## melt data set to long for ggplot2
dat1 <- dat0 %>%
  pivot_longer(
    cols=`I`:`III`,
    names_to = "TS",
    values_to = "Pred.Prob"
  ) %>%
  pivot_longer(
    cols=`Fusobacterium nucleatum`:`Campylobacter concisus`,
    names_to="OTU",
    values_to="Abundance"
  )
## plot predicted probabilities across Abundance values for each level of OTU
## facetted by tumor.stage
ggplot(dat1, aes(x = Abundance, y = Pred.Prob, color=TS, group=TS, linetype=TS)) +
  geom_line() + 
  facet_grid(OTU ~., scales="free")+
  labs(y="Probability of Tumor Stage",
       x="Relative Abundance (%)",
       title="Tumor stage likelihood with bacteria abundance",
       color="Tumor Stage", linetype="Tumor Stage")+
  scale_x_continuous(
    trans=scales::trans_new("root", root, invroot),
    breaks=c(0, 0.001,0.01, 0.1, 1,10,25),
    labels = c(0, 0.001,0.01, 0.1, 1,10,25),
    limits = c(0, 25)
  )+
  theme(
    panel.grid = element_blank()
  )

Proportional Odds Assumption

dat.rna.wide <- dat.rna.wide %>%
  mutate(
    TI = I(as.numeric(tumor.stage) >= 1),
    TII = I(as.numeric(tumor.stage) >= 2),
    TIII = I(as.numeric(tumor.stage) >= 3)
  )
apply(dat.rna.wide[,c('TII','TIII')],2, table)

      TII TIII
FALSE  12   38
TRUE   43   17

summary(glm(TII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.rna.wide, family=binomial(link="logit")))

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred


Call:
glm(formula = TII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.rna.wide)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.157   0.000   0.138   0.333   1.762  

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)   
(Intercept)                   0.0420     0.9230    0.05   0.9637   
`Streptococcus spp.`        184.3652   108.3115    1.70   0.0887 . 
`Fusobacterium nucleatum`    -1.0753     0.8912   -1.21   0.2276   
`Prevotella melaninogenica`  26.7602    51.5574    0.52   0.6037   
`Campylobacter concisus`     99.8426   493.2974    0.20   0.8396   
age.c                        -0.0457     0.0458   -1.00   0.3189   
female.c                     -3.4180     1.3056   -2.62   0.0088 **
bmi.c                         0.0139     0.0561    0.25   0.8049   
bar.c                        -2.2659     1.3256   -1.71   0.0874 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 57.706  on 54  degrees of freedom
Residual deviance: 27.629  on 46  degrees of freedom
AIC: 45.63

Number of Fisher Scoring iterations: 9

summary(glm(TIII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + `Prevotella melaninogenica` + `Campylobacter concisus`+ age.c + female.c + bmi.c + bar.c, data = dat.rna.wide, family=binomial(link="logit")))

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred


Call:
glm(formula = TIII ~ 1 + `Streptococcus spp.` + `Fusobacterium nucleatum` + 
    `Prevotella melaninogenica` + `Campylobacter concisus` + 
    age.c + female.c + bmi.c + bar.c, family = binomial(link = "logit"), 
    data = dat.rna.wide)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5739  -0.7366  -0.0001   0.9631   2.2206  

Coefficients:
                             Estimate Std. Error z value Pr(>|z|)  
(Intercept)                 -5.52e+00   6.71e+02   -0.01    0.993  
`Streptococcus spp.`        -2.28e+01   1.09e+01   -2.10    0.036 *
`Fusobacterium nucleatum`   -1.62e-01   2.70e-01   -0.60    0.548  
`Prevotella melaninogenica` -3.18e+01   1.36e+01   -2.33    0.020 *
`Campylobacter concisus`     2.10e+02   9.21e+01    2.29    0.022 *
age.c                        7.13e-03   3.00e-02    0.24    0.812  
female.c                    -1.86e+01   2.69e+03   -0.01    0.994  
bmi.c                       -8.57e-02   8.76e-02   -0.98    0.328  
bar.c                       -1.87e+01   3.20e+03   -0.01    0.995  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 68.021  on 54  degrees of freedom
Residual deviance: 45.884  on 46  degrees of freedom
AIC: 63.88

Number of Fisher Scoring iterations: 18

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1     dendextend_1.14.0 ggdendro_0.1.22   reshape2_1.4.4   
 [5] car_3.0-10        carData_3.0-4     gvlma_1.0.0.3     patchwork_1.1.1  
 [9] viridis_0.5.1     viridisLite_0.3.0 gridExtra_2.3     xtable_1.8-4     
[13] kableExtra_1.3.1  MASS_7.3-53       data.table_1.13.6 readxl_1.3.1     
[17] forcats_0.5.1     stringr_1.4.0     dplyr_1.0.3       purrr_0.3.4      
[21] readr_1.4.0       tidyr_1.1.2       tibble_3.0.6      ggplot2_3.3.3    
[25] tidyverse_1.3.0   lmerTest_3.1-3    lme4_1.1-26       Matrix_1.2-18    
[29] vegan_2.5-7       lattice_0.20-41   permute_0.9-5     phyloseq_1.34.0  
[33] workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] minqa_1.2.4         colorspace_2.0-0    rio_0.5.16         
 [4] ellipsis_0.3.1      rprojroot_2.0.2     XVector_0.30.0     
 [7] fs_1.5.0            rstudioapi_0.13     farver_2.0.3       
[10] lubridate_1.7.9.2   xml2_1.3.2          codetools_0.2-16   
[13] splines_4.0.3       knitr_1.31          ade4_1.7-16        
[16] jsonlite_1.7.2      nloptr_1.2.2.2      broom_0.7.4        
[19] cluster_2.1.0       dbplyr_2.1.0        BiocManager_1.30.10
[22] compiler_4.0.3      httr_1.4.2          backports_1.2.1    
[25] assertthat_0.2.1    cli_2.3.0           later_1.1.0.1      
[28] htmltools_0.5.1.1   prettyunits_1.1.1   tools_4.0.3        
[31] igraph_1.2.6        gtable_0.3.0        glue_1.4.2         
[34] Rcpp_1.0.6          Biobase_2.50.0      cellranger_1.1.0   
[37] vctrs_0.3.6         Biostrings_2.58.0   rhdf5filters_1.2.0 
[40] multtest_2.46.0     ape_5.4-1           nlme_3.1-149       
[43] iterators_1.0.13    xfun_0.20           ps_1.5.0           
[46] openxlsx_4.2.3      rvest_0.3.6         lifecycle_0.2.0    
[49] statmod_1.4.35      zlibbioc_1.36.0     scales_1.1.1       
[52] hms_1.0.0           promises_1.1.1      parallel_4.0.3     
[55] biomformat_1.18.0   rhdf5_2.34.0        curl_4.3           
[58] yaml_2.2.1          stringi_1.5.3       highr_0.8          
[61] S4Vectors_0.28.1    foreach_1.5.1       BiocGenerics_0.36.0
[64] zip_2.1.1           boot_1.3-25         rlang_0.4.10       
[67] pkgconfig_2.0.3     evaluate_0.14       Rhdf5lib_1.12.1    
[70] labeling_0.4.2      tidyselect_1.1.0    plyr_1.8.6         
[73] magrittr_2.0.1      R6_2.5.0            IRanges_2.24.1     
[76] generics_0.1.0      DBI_1.1.1           foreign_0.8-80     
[79] pillar_1.4.7        haven_2.3.1         whisker_0.4        
[82] withr_2.4.1         mgcv_1.8-33         abind_1.4-5        
[85] survival_3.2-7      modelr_0.1.8        crayon_1.4.1       
[88] rmarkdown_2.6       progress_1.2.2      grid_4.0.3         
[91] git2r_0.28.0        reprex_1.0.0        digest_0.6.27      
[94] webshot_0.5.2       httpuv_1.5.5        numDeriv_2016.8-1.1
[97] stats4_4.0.3        munsell_0.5.0

Results for Ordered Logistic Regression Models

Question 3

Update as of 2021-03-04

NCI 16s data

Double Checking Data

Ordered Logistic Regression

Model 0: TS ~ 1

Model 1: TS ~ OTU

Model 2: TS ~ OTU + COVARIATES

Proportional Odds Assumption

TCGA RNAseq data

Double Checking Data

Ordered Logistic Regression

Model 0: TS ~ 1

Model 1: TS ~ OTU

Model 2: TS ~ OTU + COVARIATES

Proportional Odds Assumption