class: left, middle, inverse, title-slide .title[ # Confounding II ] .author[ ### Mabel Carabali ] .institute[ ### EBOH, McGill University ] .date[ ### 01-08-2022 Updated: ( 2024-10-01) ] --- class: middle # The Structure of Confounding ## and worked examples ! --- class: middle ###Conditions that allow a variable to be a confounder: 📣 **Modern Epidemiology 4th, page 268** 💡 > <span style="color:darkblue"> _The developments in causal inference over the past decades, summarized in Chapter 3, have made clear that this definition [ ...the traditional criteria described from ME3... ] of a “confounder” is inadequate. It is inadequate because there can be a pre-exposure variable associated with the exposure and the outcome, the control of which introduces, rather than eliminates, bias_ [ME4;p268] </span> --- class: middle ### .red[The Structure of Confounding??] <img src="images/dogmath.png" width="60%" style="display: block; margin: auto 0 auto auto;" /> --- class: middle ### The Structure of Confounding `$$A \leftarrow L \to Y$$` This diagram shows two sources of association between treatment and outcome: 1. The path `\(A → Y\)` that represents the causal effect of A on Y , and 2. The path `\(A ← L → Y\)` between A and Y that includes the common cause `\(L\)` - The path `\(A ← L → Y\)` links A and Y through the common cause `\(L\)`, is the **_"backdoor path"_** --- class: middle ### The structure of Confounding - In a causal DAG, a backdoor path is a non-causal path between treatment and outcome that remains even if all arrows pointing from treatment to other variables (i.e., the descendants of treatment) are removed. - That is, the path has an arrow pointing into treatment. <img src="images/L6_HRfig71.png" width="60%" /> --- class: middle ### Confounding and exchangeability - The backdoor criterion, **does not** answer questions regarding the magnitude or direction of confounding. - It is possible that some unblocked backdoor paths are weak and thus induce little bias, or that several strong backdoor paths induce bias in opposite directions and thus result in a weak net bias. - Because unmeasured confounding is not an “all or nothing” issue, in practice, it is important to consider the expected direction and magnitude of the bias. --- class: middle ## Confounders ( `\(Y\)` ← `\(L\)` → `\(A\)` ) .pull-left[ <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-3-1.png" width="80%" /> ] .pull-right[ **Simulated Example** ```r set.seed(704); N <- 100; L <- rbinom(N,1,0.5) A <- ifelse(L==0,rbinom(N,1,0.25), rbinom(N,1,0.75)) Y <- ifelse(L==0,rbinom(N,1,0.20), rbinom(N,1,0.8)) #summary(L) data <- data.frame(N, A, L, Y) tab <- table(data$A, data$Y) #tab; tab/margin.table(tab) l6conf1<-epi.2by2(tab, method = "cohort.count") ``` ] -- ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 39 13 52 75.00 (61.05 to 85.97) ## Exposed - 14 34 48 29.17 (16.95 to 44.06) ## Total 53 47 100 53.00 (42.76 to 63.06) ``` *Outcomes per 100 population units --- class: middle ## Confounders .pull-left[ **Crude** ```r tabl6conf1 <- data.table::as.data.table(l6conf1$massoc.summary) kable(tabl6conf1, digits = 2) %>% kable_paper() ``` <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> var </th> <th style="text-align:right;"> est </th> <th style="text-align:right;"> lower </th> <th style="text-align:right;"> upper </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio </td> <td style="text-align:right;"> 2.57 </td> <td style="text-align:right;"> 1.61 </td> <td style="text-align:right;"> 4.11 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio </td> <td style="text-align:right;"> 7.29 </td> <td style="text-align:right;"> 3.01 </td> <td style="text-align:right;"> 17.63 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk * </td> <td style="text-align:right;"> 45.83 </td> <td style="text-align:right;"> 28.40 </td> <td style="text-align:right;"> 63.26 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in exposed (%) </td> <td style="text-align:right;"> 61.11 </td> <td style="text-align:right;"> 37.90 </td> <td style="text-align:right;"> 75.64 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk in population * </td> <td style="text-align:right;"> 23.83 </td> <td style="text-align:right;"> 7.68 </td> <td style="text-align:right;"> 39.99 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in population (%) </td> <td style="text-align:right;"> 44.97 </td> <td style="text-align:right;"> 21.55 </td> <td style="text-align:right;"> 61.40 </td> </tr> </tbody> </table> ] -- .pull-right[ ```r l6strtab1<- data %>% tbl_summary(by= L, label=list(Y ="Outcome", A ="Exposure"), #type = all_continuous() ~ "continuous1", statistic = all_categorical() ~ c( "{n} / {N} ({p}%)"), missing = "no") %>% modify_spanning_header(c("stat_1", "stat_2") ~ "**L=0/L=1**") %>% modify_caption("**Summary of covars distribution**") %>% bold_labels() l6strtab1 ```
<caption class='gt_caption'><strong>Summary of covars distribution</strong></caption>
Characteristic
L=0/L=1
0
N = 51
1
1
N = 49
1
N
100
51 / 51 (100%)
49 / 49 (100%)
Exposure
10 / 51 (20%)
38 / 49 (78%)
Outcome
9 / 51 (18%)
38 / 49 (78%)
1
n / N (%)
] --- class: middle ## Confounders .pull-left[ **L=0** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Exposure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Outcome; L=0</div></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> 0 </th> <th style="text-align:right;"> 1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 36 </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 4 </td> </tr> </tbody> </table> ] .pull-right[ **L=1** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Exposure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Outcome; L=1</div></th> </tr> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> 0 </th> <th style="text-align:right;"> 1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 30 </td> </tr> </tbody> </table> ] ```r tab1 <- table(data$A, data$Y, data$L) #tab1 l6conf2<-epi.2by2(tab1, method = "cohort.count") tabl6conf2 <- data.table::as.data.table(l6conf2$massoc.summary) ``` --- class: middle ## Confounders **Adjusted** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 2.57 </td> <td style="text-align:right;"> 1.61 </td> <td style="text-align:right;"> 4.11 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 1.42 </td> <td style="text-align:right;"> 0.87 </td> <td style="text-align:right;"> 2.30 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.81 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 7.29 </td> <td style="text-align:right;"> 3.01 </td> <td style="text-align:right;"> 17.63 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 2.46 </td> <td style="text-align:right;"> 0.84 </td> <td style="text-align:right;"> 7.21 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 2.96 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 45.83 </td> <td style="text-align:right;"> 28.40 </td> <td style="text-align:right;"> 63.26 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 16.69 </td> <td style="text-align:right;"> -16.33 </td> <td style="text-align:right;"> 49.71 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 2.75 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> .small[*Outcomes per 100 population units] --- class: middle ## Confounders? Consider this DAG: `$$C \to E \to Y$$` - In this case, C is not a confounder because it does not have an independent effect on Y. - But there will be an observed association between C and Y, by virtue of their common association with E. - But it is not an independent association. **That’s why we should assess this criterion within levels of exposure.** - Stratified by E, the association between C and Y is null if there is no direct effect (as shown in the DAG). --- class: middle ## Confounders ? .pull-left[ `\(C \to E \to Y\)` ```r set.seed(704) N <- 100 C <- rbinom(N,1,0.5) E <- ifelse(C==0,rbinom(N,1,0.8), rbinom(N,1,0.5)) Y <- ifelse(E==0,rbinom(N,1,0.2), rbinom(N,1,0.5)) #summary(C) data1 <- data.frame(N, C, E,Y) tab1C <- table(data1$E, data1$Y, data1$C) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.91 </td> <td style="text-align:right;"> 1.36 </td> <td style="text-align:right;"> 2.70 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 1.75 </td> <td style="text-align:right;"> 1.14 </td> <td style="text-align:right;"> 2.69 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.09 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 5.12 </td> <td style="text-align:right;"> 2.02 </td> <td style="text-align:right;"> 12.97 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 3.83 </td> <td style="text-align:right;"> 1.45 </td> <td style="text-align:right;"> 10.09 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 1.34 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 37.15 </td> <td style="text-align:right;"> 19.01 </td> <td style="text-align:right;"> 55.30 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 31.20 </td> <td style="text-align:right;"> -6.71 </td> <td style="text-align:right;"> 69.12 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 1.19 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- class: middle ## Confounders ? Figure 7.4 A version of the famous M-diagram again. No confounding, despite backdoor paths. <img src="images/L6fig4HR.png" width="30%" /> Here there are no common causes of treatment A and outcome Y, and therefore there is no confounding. The back door path between A ← U2 → L ← U1 → Y is locked because `\(L\)` is a collider on that path. --- class: middle ## Confounders .pull-left[ No common causes but L is a collider `\(U1 \to L \leftarrow U2\)` ```r set.seed(704) N <- 100 U1 <- rbinom(N,1,0.5) U2 <- rbinom(N,1,0.5) L <- ifelse(U1==1, rbinom(N,1,0.6), ifelse(U2==1, rbinom(N,1,0.6), rbinom(N,1,0.5))) #L is affected by U1 and U2 A <- ifelse(U2==1, rbinom(N,1,0.5), rbinom(N,1,0.5)) #A is affected by U2 Y <- ifelse(A==1, rbinom(N,1,0.6), ifelse( U1==1, rbinom(N,1,0.6), rbinom(N,1,0.5))) # Y is affected by A and U1 #summary(C) datanoconf2 <- data.frame(N, U1, U2, L, A,Y) tab.noconf2<- table(datanoconf2$A, datanoconf2$Y, datanoconf2$L) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.73 </td> <td style="text-align:right;"> 1.05 </td> <td style="text-align:right;"> 2.86 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 1.70 </td> <td style="text-align:right;"> 1.04 </td> <td style="text-align:right;"> 2.78 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.02 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 2.53 </td> <td style="text-align:right;"> 1.11 </td> <td style="text-align:right;"> 5.74 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 2.46 </td> <td style="text-align:right;"> 1.08 </td> <td style="text-align:right;"> 5.61 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 1.03 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 22.00 </td> <td style="text-align:right;"> 3.21 </td> <td style="text-align:right;"> 40.79 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 21.33 </td> <td style="text-align:right;"> 0.74 </td> <td style="text-align:right;"> 41.92 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 1.03 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- class: middle ## Confounders <img src="images/L6fig5HR.png" width="30%" /> <span style="color:purple"> There is an arrow `\(L \to A\)`. </span> The presence of this arrow creates an open backdoor path: - A ← L ← U1 → Y, because U1 is a common cause of A and Y, and so **confounding exists.** - Conditioning on L would block that backdoor path but would simultaneously open a backdoor path on which L is a collider (A ← U2 → L ← U1 → Y) The bias is **intractable:** attempting to block the confounding path opens a selection bias path. --- class: middle ## Confounding ? Colliders? .pull-left[ ```r dag <- ggdag::dagify(Y ~ A + U1, A ~ L + U2, L ~ U1 + U2, exposure = "A", outcome = "Y", * latent = c("U1", "U2"), coords = list(x = c(L = 3.2, Y = 3.8, A = 3.5, U1=3, U2=3), y = c(U2 = 1, L = 1.3, A=1.3, Y=1.3, U1 = 1.5))) dag_plot <- dag %>% ggdag::tidy_dagitty(layout = "manual", seed = 704) %>% arrange(name) %>% ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + #xlim()+ geom_dag_point() + geom_dag_edges() + theme_dag() + geom_dag_node(color="darkmagenta") + geom_dag_text(color="white") ``` ] -- .pull-right[ <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- class: middle ### Confounding ? Colliders? ```r #control_for(dag, var = "L") #ggdag_paths(dag) +theme_dag() ggdag_adjust(dag, var = "L", stylized = T, collider_lines = T) + theme_dag() ``` <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-20-1.png" width="90%" style="display: block; margin: auto 0 auto auto;" /> --- class: middle ##R can help ... ```r g <- dagitty::paths(dag, "A", "Y") a <- paste0("There are ", length(g$paths), " pathways from A to Y") b <- paste0("Of these backdoor pathways ", sum(g$open=="TRUE"), " are open") c <- paste0("The adjustment sets are ", adjustmentSets(dag, "A", "Y", type = "canonical")) print(c(a,b,c)) ``` ``` ## [1] "There are 3 pathways from A to Y" ## [2] "Of these backdoor pathways 2 are open" ## [3] "The adjustment sets are " ``` The bias is **.red[intractable:]** attempting to block the confounding path opens a selection bias path. --- class: middle ## Confounders .pull-left[ ```r set.seed(704) N <- 100 U1 <- rbinom(N,1,0.5) U2 <- rbinom(N,1,0.5) L <- ifelse(U1==1, rbinom(N,1,0.65), ifelse(U2==1, rbinom(N,1,0.65), rbinom(N,1,0.15))) #L is affected by U1 and U2 A <- ifelse(L==1, rbinom(N,1,0.65), ifelse(U2==1, rbinom(N,1,0.65), rbinom(N,1,0.45))) #A is affected by L and U2 Y <- ifelse(A==1, rbinom(N,1,0.65), ifelse(U1==1, rbinom(N,1,0.6), rbinom(N,1,0.3))) # Y is affected by A and U1 #summary(C) data2 <- data.frame(N, U1, U2, L, A,Y) tabL.intract <- table(data2$A, data2$Y, data2$L) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.91 </td> <td style="text-align:right;"> 1.16 </td> <td style="text-align:right;"> 3.13 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 2.12 </td> <td style="text-align:right;"> 1.18 </td> <td style="text-align:right;"> 3.80 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 0.90 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 3.00 </td> <td style="text-align:right;"> 1.31 </td> <td style="text-align:right;"> 6.88 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 1.35 </td> <td style="text-align:right;"> 7.28 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 0.96 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 25.97 </td> <td style="text-align:right;"> 7.09 </td> <td style="text-align:right;"> 44.85 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 28.01 </td> <td style="text-align:right;"> 5.35 </td> <td style="text-align:right;"> 50.67 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 0.93 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- class: middle ## Confounders Figure 7.7 is another non confounding example in which the traditional criteria lead to selection bias due to adjustment for L. <img src="images/L6fig7HR.png" width="30%" /> - The traditional criteria would not have resulted in bias had condition (3) been replaced by the condition that L is not caused by treatment. - .red[ _(3) it does not lie on a causal pathway between treatment and outcome._] > Replace condition (3) by the condition that “there exist variables A and Y such that there is conditional exchangeability within their joint levels `\(Y^a \perp A| L, U\)`". .blue[H&R, Technical Point 7.2] --- class: middle ## Confounders .pull-left[ ** L is not on the "pathway" `\(A \to Y\)` ** ```r set.seed(704) N <- 100 U <- rbinom(N,1,0.5) A <-rbinom(N,1,0.55) #A affects L L <- ifelse(U==1, rbinom(N,1,0.65), ifelse(A==1, rbinom(N,1,0.65), rbinom(N,1,0.25))) #L is affected by U and A Y <- ifelse(U==1, rbinom(N,1,0.6), rbinom(N,1,0.25)) # Y is affected by U datanoconf3 <- data.frame(N, U, L, A,Y) tabL.noconf3 <- table(datanoconf3$A, datanoconf3$Y, datanoconf3$L) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 0.98 </td> <td style="text-align:right;"> 0.67 </td> <td style="text-align:right;"> 1.42 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 0.98 </td> <td style="text-align:right;"> 0.61 </td> <td style="text-align:right;"> 1.57 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 0.96 </td> <td style="text-align:right;"> 0.42 </td> <td style="text-align:right;"> 2.18 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 0.97 </td> <td style="text-align:right;"> 0.42 </td> <td style="text-align:right;"> 2.23 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> -1.10 </td> <td style="text-align:right;"> -21.55 </td> <td style="text-align:right;"> 19.36 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> -0.85 </td> <td style="text-align:right;"> -25.69 </td> <td style="text-align:right;"> 23.99 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 1.30 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- ###Surrogate confounders (Is L a confounder?) In Figure 7.8, confounding of A on Y via unmeasured common cause U . <img src="images/L6fig8HR.png" width="50%" /> - Measured variable L is a proxy or surrogate for U . Adjust for the variable L? - On the one hand, L is not a confounder because it does not lie on a backdoor path between A and Y . --- class: middle ## Confounders .pull-left[ **Surrogates when L is not highly correlated with U** ```r set.seed(704) N <- 100 U <- rbinom(N,1,0.8) A <- ifelse(U==1, rbinom(N,1,0.65), rbinom(N,1,0.5)) #A is affected by U L <- ifelse(U==1, rbinom(N,1,0.65), rbinom(N,1,0.5)) #L is affected by U Y <- ifelse(A==1, rbinom(N,1,0.65), ifelse(U==1, rbinom(N,1,0.65), rbinom(N,1,0.15))) # Y is affected by U and A dataconf4 <- data.frame(N, U, L, A,Y) tabL.conf4 <- table(dataconf4$A, dataconf4$Y, dataconf4$L) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.94 </td> <td style="text-align:right;"> 1.22 </td> <td style="text-align:right;"> 3.08 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 2.01 </td> <td style="text-align:right;"> 1.23 </td> <td style="text-align:right;"> 3.30 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 0.97 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 3.29 </td> <td style="text-align:right;"> 1.39 </td> <td style="text-align:right;"> 7.78 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 3.46 </td> <td style="text-align:right;"> 1.41 </td> <td style="text-align:right;"> 8.50 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 0.95 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 28.52 </td> <td style="text-align:right;"> 8.61 </td> <td style="text-align:right;"> 48.44 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 29.58 </td> <td style="text-align:right;"> -5.76 </td> <td style="text-align:right;"> 64.92 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 0.96 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- ###Surrogate confounders (Is L a confounder?) - On the other hand, adjusting for L, which is associated with U , will indirectly adjust for some of the confounding caused by U . <img src="images/L6fig8HR.png" width="30%" /> - In the extreme case that L were perfectly correlated with U then adjusting for L = adjusting for U. - Therefore we will typically prefer to adjust, rather than not to adjust, for L. --- class: middle ## Confounders **Surrogates when L and U correlated** .pull-left[ ```r set.seed(704) N <- 100 U <- rbinom(N,1,0.8) A <- ifelse(U==1, rbinom(N,1,0.65), rbinom(N,1,0.5)) #A is affected by U L <- ifelse(U==1, rbinom(N,1,0.95), rbinom(N,1,0.5)) #L is affected by U Y <- ifelse(A==1, rbinom(N,1, 0.65), ifelse(U==1, rbinom(N,1,0.65), rbinom(N,1,0.15))) # Y is affected by U and A dataconf5 <- data.frame(N, U, L, A,Y) tabL.conf5 <- table(dataconf5$A, dataconf5$Y, dataconf5$L) ``` ] .pull-right[ <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.94 </td> <td style="text-align:right;"> 1.22 </td> <td style="text-align:right;"> 3.08 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 1.74 </td> <td style="text-align:right;"> 1.06 </td> <td style="text-align:right;"> 2.88 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.11 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 3.29 </td> <td style="text-align:right;"> 1.39 </td> <td style="text-align:right;"> 7.78 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 2.69 </td> <td style="text-align:right;"> 1.11 </td> <td style="text-align:right;"> 6.53 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 1.22 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 28.52 </td> <td style="text-align:right;"> 8.61 </td> <td style="text-align:right;"> 48.44 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 23.16 </td> <td style="text-align:right;"> -8.90 </td> <td style="text-align:right;"> 55.21 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 1.23 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ] --- class: middle ###Confounders cannot be descendants of treatment, but can be in the future of treatment In Figure 7.11. L is a descendant of treatment A that blocks all backdoor paths from A to Y. <img src="images/L6fig11HR.png" width="30%" style="display: block; margin: auto;" /> -- - Conditioning on L does not cause selection bias because no collider path is opened. - Since the causal effect of A on Y is only through L, conditioning on L completely blocks this pathway. - This shows that adjusting for a variable L that blocks all backdoor paths does not eliminate bias when L is a descendant of A. - Since `\(Y^a \perp \!\!\! \perp A|L\)` implies adjustment for L eliminates all bias, there must not be conditional exchangeability, - **and thus `\(E[Ya=1] − E[Ya=0]\)` is not identified.** --- --- class: middle ## Confounding ? Colliders? .pull-left[ ```r dag1 <- ggdag::dagify(Y ~ L, A ~ U, L ~ U + A, exposure = "A", outcome = "Y", * latent = "U", coords = list(x = c(L = 2, Y = 2.5, A = 1.5, U=1), y = c(U = 1, L = 1.5, A=1.5, Y=1.5))) dag_plot1 <- dag1 %>% ggdag::tidy_dagitty(layout = "manual", seed = 704) %>% arrange(name) %>% ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + #xlim()+ geom_dag_point() + geom_dag_edges() + theme_dag() + geom_dag_node(color="darkmagenta") + geom_dag_text(color="white") ``` ] -- .pull-right[ <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-35-1.png" width="90%" style="display: block; margin: auto auto auto 0;" /> ] --- class: middle ### Confounders as descendants ? Colliders? ```r #control_for(dag1, var = "L") #ggdag_paths(dag1) +theme_dag() ggdag_adjust(dag1, var = "L", stylized = T, collider_lines = T) + theme_dag() ``` <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-36-1.png" width="90%" style="display: block; margin: auto 0 auto auto;" /> --- class: middle ##R can help ... ```r g1 <- dagitty::paths(dag1, "A", "Y") a1 <- paste0("There are ", length(g$paths), " pathways from A to Y") b1 <- paste0("Of these backdoor pathways ", sum(g1$open=="TRUE"), " are open") c1 <- paste0("The adjustment sets are ", adjustmentSets(dag1, "A", "Y", type = "canonical")) print(c(a1,b1,c1)) ``` ``` ## [1] "There are 3 pathways from A to Y" ## [2] "Of these backdoor pathways 2 are open" ## [3] "The adjustment sets are " ``` The bias is **.red[ and thus `\(E[Ya=1] − E[Ya=0]\)` is not identified.]** attempting to block the confounding path opens a selection bias path. --- class: middle ### Do we know what a confounder is? <img src="images/L6confoundingvariable.png" width="60%" style="display: block; margin: auto;" /> [Confounding Variable Joke](http://www.bazinganomics.com/bazinganomics//confounding-variable-joke) --- class: middle ## How to adjust for confounding <img src="images/confcontrol.png" width="75%" style="display: block; margin: auto;" /> --- class: middle ## How to adjust for confounding - **Randomization is the best method.** - In conditionally randomized experiments given covariates `\(L\)`, the common causes (i.e., the covariates L) are measured and thus the adjusted (standardization or IP weighting) association measure is expected to equal the effect measure. - Subject-matter knowledge to identify adjustment variables is **_discretionary_ in "ideal" randomized experiments**. - On the other hand, **subject-matter knowledge is key (a must!) in observational studies** in order to identify and measure adjustment variables (e.g., for regression adjustment). --- class: middle ## How to adjust for confounding - Causal inference from observational data relies on the **uncheckable assumption** that we have used our knowledge to identify and measure a set of variables `\(L\)` that is a sufficient set for confounding adjustment: - The set of non-descendants of treatment that includes enough variables to block all backdoor paths. - Under this assumption of no unmeasured confounding or of conditional exchangeability given `\(L\)`, standardization and Inverse Probability (IP) weighting can be used to compute the average causal effect in the population. --- class: middle ### Standardization **Why standardize?** - To control for confounding - To summarize many estimates into one - Is a weighted average of measures of occurrence across a distribution (say, age). - Can be applied to any measure of occurrence or measure of effect - Weights are chosen based on the population of interest (ME3, pg. 49) --- ### Standardized measures of association and effect - Let `\(I_k\)` represent strata specific incidence rates and - let `\(I_k^*\)` represent another schedule of such rates (perhaps based on a different exposure distribution) - Let `\(Tk\)` represent person-time at risk in each strata `\(I_s = \left(\frac{\sum_{k=1}^K T_k I_k}{\sum_{k=1}^K T_k }\right)\)` `\(I_s^* = \left(\frac{\sum_{k=1}^K T_k I_k^*}{\sum_{k=1}^K T_k }\right)\)` - Then the standardized rate ratio is: `\(IR_s = I_s/I_s^*\)` - The standardized rate difference is: `\(IR_s = I_s - I_s^* = \sum T_k(I_k - I_k^*)\)` (ME3, pg. 67) --- class: middle ## Standardized measures of association and effect - Note that the standardized rate difference is a weighted average of stratum-specific rate differences **Interpretation of both measures:** - Effects of exposure on this population. - For the standardized rate ratio we need to assume that the relative distribution of person-time would be unaffected by exposure. – Standardized <span style="color:purple"> risk ratios </span> do not require this assumption because the denominators do not use person-time. --- ### Example: [COVID-19 vaccine effectiveness in the UK](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1025358/Vaccine-surveillance-report-week-41.pdf) UK Health Security Agency "COVID-19 vaccine surveillance report", Week 41 <img src="images/L6casesUK.png" width="60%" style="display: block; margin: auto;" /> **Rates (per 100,000) by vaccination status from week 37 to week 40 2021** --- #### Example: COVID-19 vaccine effectiveness in the UK (2) [Numbers by variant are reported by Public Health England](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1018547/Technical_Briefing_23_21_09_16.pdf). <img src="images/L6reportUKage.png" width="70%" style="display: block; margin: auto;" /> From: Table 5. Attendance to emergency care and deaths of sequenced and genotyped Delta cases in England by vaccination status (1 February 2021 to 12 September 2021) [here]((https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1018547/Technical_Briefing_23_21_09_16.pdf).) --- #### Example: COVID-19 vaccine effectiveness in the UK **Let's play with the numbers (1):** <span style="color:blue"> check the risk difference (RD) </span> ```r #157400 - 2361 #exposed without outcome #257357 - 30801 #unexposed without outcome l6UKdata<-c(2361,155039, 3080, 254277) l6UKest<- epi.2by2(l6UKdata, method = "cohort.count") l6UKest ``` ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 2361 155039 157400 1.50 (1.44 to 1.56) ## Exposed - 3080 254277 257357 1.20 (1.16 to 1.24) ## Total 5441 409316 414757 1.31 (1.28 to 1.35) ## ## Point estimates and 95% CIs: ## ------------------------------------------------------------------- ## Inc risk ratio 1.25 (1.19, 1.32) ## Inc odds ratio 1.26 (1.19, 1.33) ## Attrib risk in the exposed * 0.30 (0.23, 0.38) ## Attrib fraction in the exposed (%) 20.21 (15.85, 24.35) ## Attrib risk in the population * 0.12 (0.06, 0.17) ## Attrib fraction in the population (%) 8.77 (6.64, 10.86) ## ------------------------------------------------------------------- ## Uncorrected chi2 test that OR = 1: chi2(1) = 69.360 Pr>chi2 = <0.001 ## Fisher exact test that OR = 1: Pr>chi2 = <0.001 ## Wald confidence limits ## CI: confidence interval ## * Outcomes per 100 population units ``` ** what would we conclude?** --- ### Example: COVID-19 vaccine effectiveness in the UK - **<span style="color:blue"> Missing something? </span>** <img src="images/L6reportUKage1.png" width="75%" style="display: block; margin: auto;" /> From: Table 5. Attendance to emergency care and deaths of sequenced and genotyped Delta cases in England by vaccination status (1 February 2021 to 12 September 2021) [here]((https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1018547/Technical_Briefing_23_21_09_16.pdf).) .small[Note: The totals do not exactly sum up to the previous table, as age was missing in a few cases.] --- class: middle #### Example: COVID-19 vaccine effectiveness in the UK **Let's play with the numbers (2) - Standardization** .pull-left-narrow[ **Outcomes among people under 50 years** ```r l6UKdatu50<-c(453,84954, 2416, 246387) l6UKt1u50<- epi.2by2(l6UKdatu50, method = "cohort.count") l6UKt1u50$tab ``` ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 453 84954 85407 0.53 (0.48 to 0.58) ## Exposed - 2416 246387 248803 0.97 (0.93 to 1.01) ## Total 2869 331341 334210 0.86 (0.83 to 0.89) ``` ] .pull-right-narrow[ **Outcomes among people `\(\geq\)` 50 years** ```r l6UKdatm50<-c(1908 , 70083, 664, 7887) l6UKt1m50<- epi.2by2(l6UKdatm50, method = "cohort.count") l6UKt1m50$tab ``` ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 1908 70083 71991 2.65 (2.53 to 2.77) ## Exposed - 664 7887 8551 7.77 (7.21 to 8.35) ## Total 2572 77970 80542 3.19 (3.07 to 3.32) ``` ] --- #### Example: COVID-19 vaccine effectiveness in the UK **Let's play with the numbers (3)** <span style="color:blue"> check the risk differences (RD)</span> .pull-left[ **Outcomes among people under 50 years** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio </td> <td style="text-align:right;"> 0.55 </td> <td style="text-align:right;"> 0.49 </td> <td style="text-align:right;"> 0.60 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio </td> <td style="text-align:right;"> 0.54 </td> <td style="text-align:right;"> 0.49 </td> <td style="text-align:right;"> 0.60 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk * </td> <td style="text-align:right;"> -0.44 </td> <td style="text-align:right;"> -0.50 </td> <td style="text-align:right;"> -0.38 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in exposed (%) </td> <td style="text-align:right;"> -83.08 </td> <td style="text-align:right;"> -102.34 </td> <td style="text-align:right;"> -65.65 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk in population * </td> <td style="text-align:right;"> -0.11 </td> <td style="text-align:right;"> -0.16 </td> <td style="text-align:right;"> -0.06 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in population (%) </td> <td style="text-align:right;"> -13.12 </td> <td style="text-align:right;"> -14.92 </td> <td style="text-align:right;"> -11.34 </td> </tr> </tbody> </table> ] .pull-right[ **Outcomes among people `\(\geq\)` 50 years** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> 0.31 </td> <td style="text-align:right;"> 0.37 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio </td> <td style="text-align:right;"> 0.32 </td> <td style="text-align:right;"> 0.30 </td> <td style="text-align:right;"> 0.35 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk * </td> <td style="text-align:right;"> -5.11 </td> <td style="text-align:right;"> -5.69 </td> <td style="text-align:right;"> -4.54 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in exposed (%) </td> <td style="text-align:right;"> -192.99 </td> <td style="text-align:right;"> -219.11 </td> <td style="text-align:right;"> -169.00 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk in population * </td> <td style="text-align:right;"> -4.57 </td> <td style="text-align:right;"> -5.15 </td> <td style="text-align:right;"> -3.99 </td> </tr> <tr> <td style="text-align:left;"> Attrib fraction in population (%) </td> <td style="text-align:right;"> -143.17 </td> <td style="text-align:right;"> -159.10 </td> <td style="text-align:right;"> -128.21 </td> </tr> </tbody> </table> ] --- class: middle #### Example: COVID-19 vaccine effectiveness in the UK **.red[Confounding?]** <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-48-1.png" width="45%" style="display: block; margin: auto;" /> ** <span style="color:blue"> We know that IRL the "L" includes a vector / set of potential covariates that could be considered as Confounders... this is an illustration only! </span>** --- class: middle ### Direct standardization Suppose we want to estimate `\(E[Y^a=1] - E[Y^a=0] = RD\)`. The conditional exchangeability allows us to say `\(Y^a \perp \!\!\! \perp A | L\)` [According to the law of total expectation](https://en.wikipedia.org/wiki/Law_of_total_expectation): <span style="color:purple"> `\(E[Y^a =1] = \sum_{x} E[Y^a=1 | X=x]Pr(x)\)` </span>; </span> <span style="color:blue"> `\(E[Y^a =0] = \sum_{x} E[Y^a=0 | X=x]Pr(x)\)` </span> - `\(\sum_{x}\)` means sum over all values x that occur in the study population. - `\(Pr(x)\)` refers to the distribution of x in that population. ** `\(RD=\)` ** `\(E[Y^a =1] - E[Y^a =0] =\)` <span style="color:purple"> `\(\sum_{x} E[Y^a=1 | X=x]P(x)\)` </span> `\(-\)` <span style="color:blue"> `\(\sum_{x} E[Y^a=0 | X=x]P(x)\)` </span> --- class: middle #### Example: COVID-19 vaccine effectiveness in the UK **Let's play with the numbers - Standardization** .pull-left-narrow[ **Outcomes among people under 50 years** ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 453 84954 85407 0.53 (0.48 to 0.58) ## Exposed - 2416 246387 248803 0.97 (0.93 to 1.01) ## Total 2869 331341 334210 0.86 (0.83 to 0.89) ``` ] .pull-right-narrow[ **Outcomes among people `\(\geq\)` 50 years** ``` ## Outcome + Outcome - Total Inc risk * ## Exposed + 1908 70083 71991 2.65 (2.53 to 2.77) ## Exposed - 664 7887 8551 7.77 (7.21 to 8.35) ## Total 2572 77970 80542 3.19 (3.07 to 3.32) ``` ] --- #### Example: COVID-19 vaccine effectiveness in the UK **Let's play with the numbers (4) - Standardization** To compute the PO using observed data, we need the consistency assumption `\(RD =\)` <span style="color:purple"> `\(\sum_{x} E[Y |A=1, X=x]P(x)\)` </span> `\(-\)` <span style="color:blue"> `\(\sum_{x} E[Y |A=0, X=x]Pr(x)\)` </span> -- <span style="color:purple"> Standardized risk in the vaccinated : </span> `\((453/85,407 × 334,210/414,752 + 1,908/71,991 × 80,542/414,752)≈ 0.94\%\)` `\(R_{vax}=\)` 0.94 <span style="color:blue"> Standardized risk in the unvaccinated : </span> `\((2,416/248,803 × 334,210/414,752 + 664/8,551 × 80,542/414,752)≈ 2.29\%\)` `\(R_{unvax}=\)` 2.29 -- **Standardized RD = -1.35 ** from (` 0.94% − 2.29% = −1.35%`) `\(\neq 0.3\)` in the crude estimates. **Standardized RR = 0.41 ** from (` 0.0094 / 0.0229`) `\(\neq 1.25\)` in the crude estimates. --- class: middle ### Example: [COVID-19 vaccine effectiveness in the UK](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1025358/Vaccine-surveillance-report-week-41.pdf) UK Health Security Agency "COVID-19 vaccine surveillance report", Week 41 .pull-left[ <img src="images/L6ratesUK.png" width="90%" /> ] .pull-right[ <img src="images/L6casesUK.png" width="90%" /> ] Cases presenting to emergency care (within 28 days of a positive test) resulting in overnight inpatient admission.[here](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1025358/Vaccine-surveillance-report-week-41.pdf). --- class: middle ## What about the Mantel-Haenzel Methods? - _Cochran-Mantel-Haenzel_ methods are useful for associations, when only few covariates are involved in the calculation. - Takes the effect in each strata of `\(L\)` or `\(Z\)` (our third variable), - Combines these measures across `\(L\)` using calculated weights `\(^1\)`, for example example: $$ RD_{M-H} = \left(\frac{\sum_l (RD_l w_l)} {\sum_l w_l}\right) = \left(\frac{RD_0 w_0 + RD_1 w_1}{ w_0 + w_1}\right)$$ - <span style="color:blue"> Are expected to work in closed cohorts and **.red[assumes homogeneity across strata!!]** </span> - Limited use in a set of covariates `\(L\)` and in presence of Effect measure modification and or interaction. `\(^1\)` There are specific formulas for RD, RR and ORs as well --- class: middle ### Standardized measures of association and effect - No assumption of homogeneity, "agnostic of the distribution", Model-based direct standardization ** `\(^1\)`** are used when `\(L (X, E, A)\)` consists of a large vectors of covariates. Involves two steps: * Fitting a regression model for the outcome given exposure and covariates * Averaging the exposure effect over the covariate distribution of the standard population. ** `\(^1\)` ** More on "advanced" techniques to address confounding empirically after we deal with regressions. --- class: middle ## Standardized Morbidity Ratio (SMR) - A generalization to standardization when the standard population is the exposed sub-population. - In this case, the standardized rate ratio becomes: `$$I_s = \left(\frac{\sum_{k=1}^K T_k I_k}{\sum_{k=1}^K T_k I_k^* }\right) = \left(\frac{\sum_{k=1}^K A_k}{\sum_{k=1}^K T_k I_k^*}\right)$$` **<span style="color:purple"> [Numerator] </span>** cases occurring in exposed (**Observed**) **<span style="color:blue">[Denominator] </span>** cases **expected** to occur in absence of exposure if exposure doesn’t affect person time at risk (ME3, pg. 68-69) --- class: middle ### How to adjust for confounding Standardization and Inverse Probability (IP) weighting are not the only methods. `$$IPW_z = \left(\frac{1}{Pr(A=a|L=z)}\right)$$` Often using regression models, **assuming the model specification is correct!** 😬 -- **<span style="color:purple"> IPW removes the arrow from `\(L \to A\)`: </span>** <img src="L8_EPIB704_Confounding_II_files/figure-html/unnamed-chunk-53-1.png" width="35%" style="display: block; margin: auto;" /> --- class: middle ## How to adjust for confounding Two categories of methods for confounding adjustment: **<span style="color:blue"> 1) G-methods (including G-formula, IP weighting, and G-estimation).</span>** These exploit conditional exchangeability in subsets defined by L to estimate the causal effect of A on Y in the entire population or in any subset of the population. - Under the assumption of conditional exchangeability given `\(L\)`, g-methods simulate `\(A-Y\)` associations in the population if backdoor paths involving variables `\(L\)` did not exist; simulated `\(A-Y\)` associations can then be attributed to the effect of `\(A\)` on `\(Y\)`. - IP weighting achieves this by creating a pseudo-population in which `\(A\)` is independent of measured confounders `\(L\)`, by “deleting” the arrow from `\(L \to A\)`. --- class: middle # How to adjust for confounding **<span style="color:blue"> 2) Stratification-based methods (including Stratification, Restriction, Matching). </span>** Methods that exploit conditional exchangeability in subsets defined by L to estimate the association between A and Y in those subsets only. Stratification-based methods estimate the association between A and Y in one or more subsets of the population in which the treated and the untreated are assumed to be exchangeable. - Hence the `\(A \to Y\)` association in each subset is entirely attributed to the effect of `\(A\)` on `\(Y\)` . - Stratification/restriction do not delete the arrow from `\(L \to A\)`, but instead calculate the association within strata of `\(L\)`, since within each level of `\(L\)`, there is no `\(L \to A\)` association to cause confounding. --- class: middle ## How to adjust for confounding All these methods require conditional exchangeability given the measured covariates `\(L\)` to identify the effect of treatment `\(A\)` on outcome `\(Y\)`. - When interested in the effect in the entire population, conditional exchangeability is required in all strata defined by `\(L\)`; - When interested in the effect in a subset of the population, conditional exchangeability is required in that subset only. - Achieving conditional exchangeability may be an unrealistic goal in many observational studies but expert knowledge can be used to get as close as possible to that goal. - At the very least, investigators should generally avoid adjustment for variables affected by either the treatment or the outcome. --- class: middle ## How to adjust for confounding Thoughtful and knowledgeable investigators could believe that various causal structures, possibly leading to different conclusions regarding confounding, are equally plausible. - DAGs simply allow us to have that discussion. - Existence of common causes of treatment and outcome does not depend on the adjustment method (although it does depend on the target population). - Adjustment for measured confounding will generally imply a change in the estimate, but not necessarily the other way around. - Changes in estimates may occur for reasons other than confounding, - **including selection bias when adjusting for non-confounders and the use of non-collapsible effect measures.** **H & R write:** > _"Attempts to define confounding based on change in estimates have been long abandoned because of these problems.” This is overstated. When using a DAG and collapsible measures, the method is a reasonable and practical strategy._" --- class: middle **A note on stratification and non-collapsibility** Comparing crude to adjusted estimates is reliable for RR and RD, but not for OR unless: a) rare outcome or b) OR ≈ RR due to design (e.g. case-cohort). **Recall the case of `\(C \to E \to Y\)`** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="1"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Measure</div></th> <th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Estimate 95%CIs</div></th> </tr> <tr> <th style="text-align:left;"> Measure </th> <th style="text-align:right;"> Est. </th> <th style="text-align:right;"> LB </th> <th style="text-align:right;"> UB </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Inc risk ratio (crude) </td> <td style="text-align:right;"> 1.91 </td> <td style="text-align:right;"> 1.36 </td> <td style="text-align:right;"> 2.70 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (M-H) </td> <td style="text-align:right;"> 1.75 </td> <td style="text-align:right;"> 1.14 </td> <td style="text-align:right;"> 2.69 </td> </tr> <tr> <td style="text-align:left;"> Inc risk ratio (crude:M-H) </td> <td style="text-align:right;"> 1.09 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude) </td> <td style="text-align:right;"> 5.12 </td> <td style="text-align:right;"> 2.02 </td> <td style="text-align:right;"> 12.97 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (M-H) </td> <td style="text-align:right;"> 3.83 </td> <td style="text-align:right;"> 1.45 </td> <td style="text-align:right;"> 10.09 </td> </tr> <tr> <td style="text-align:left;"> Inc odds ratio (crude:M-H) </td> <td style="text-align:right;"> 1.34 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude) * </td> <td style="text-align:right;"> 37.15 </td> <td style="text-align:right;"> 19.01 </td> <td style="text-align:right;"> 55.30 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (M-H) * </td> <td style="text-align:right;"> 31.20 </td> <td style="text-align:right;"> -6.71 </td> <td style="text-align:right;"> 69.12 </td> </tr> <tr> <td style="text-align:left;"> Attrib inc risk (crude:M-H) </td> <td style="text-align:right;"> 1.19 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> --- ### A note on stratification and non-collapsibility - We can say a measure of the association between A and Y is collapsible across L if the adjusted association, `\(RR_{AY}|L\)`, is equal to the crude association, `\(RR_{AY}\)`, where L is not a confounder — This means that a crude measure of association will not change if we adjust for a variable that is not a confounder `\((L)\)` - The odds (OR) and incidence density ratios (IDR) fail this property and are considered non collapsible effect measures - For the OR, the crude measure may be closer to the null than the pooled/adjusted OR, particularly with a common outcome - Therefore, for some measures, our simple crude vs. adjusted comparison **may suggest confounding when there really isn’t!** --- class: middle ### Change in estimate?? ## .red[Not Really!!!] <img src="images/changeest.png" width="65%" style="display: block; margin: auto;" /> --- class: middle ### Structural confounding, violation of Positivity High correlations between confounder and exposure: violation of the “positivity assumption”. When this is “structural” (in the sense of a high correlation that exists because of causal relations in the source population), Oakes calls this “structural confounding”. <img src="images/L6strconfounding.png" width="35%" /> Oakes JM. Advancing neighbourhood-effects research selection, inferential support, and structural confounding. Int J Epidemiol. 2006 Jun;35(3):643-7. Messer et al. Effects of Socioeconomic and Racial Residential Segregation on Preterm Birth: A Cautionary Tale of Structural Confounding AJE 2010; Mar 15;171(6):664-73. --- class: middle ### Structural confounding, violation of Positivity .pull-left[ **Data Generation Process** ```r set.seed(704); n=500 ses1 <- sample(1:12, n, replace = TRUE); ses1[ses1>=10]<-0 ses2 <- cut(ses1, breaks = c(0, 5, 10, 15), labels = c("0", "1", "2")) ses2[is.na(ses2 )]<- "2" exposure<- ifelse(ses2=="1", rbinom(n,1,0.45), ifelse(ses2=="0", rbinom(n,1,0.5), ifelse(ses2=="2", rbinom(n,1,0.0001), rbinom(n,1,0.2)))) outcome<- ifelse(ses2=="0", rbinom(n,1,0.75), ifelse(ses2=="1", rbinom(n,1,0.25), rbinom(n,1,0.25))) data.strconf <- data.frame(outcome, exposure, ses2) table(exposure, ses2) ``` ``` ## ses2 ## exposure 0 1 2 ## 0 101 107 123 ## 1 104 65 0 ``` ```r strconf2 <- glm(outcome ~ exposure, data= data.strconf, family = "binomial") strconf2a <- glm(outcome ~ exposure + as.factor(ses2), data= data.strconf, family = "binomial") ``` ] -- .pull-right[ **Regression Results** **Crude/Unadjusted** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> exp(Est.) </th> <th style="text-align:right;"> 2.5% </th> <th style="text-align:right;"> 97.5% </th> <th style="text-align:right;"> z val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 0.663 </td> <td style="text-align:right;"> 0.532 </td> <td style="text-align:right;"> 0.827 </td> <td style="text-align:right;"> -3.657 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> exposure </td> <td style="text-align:right;"> 1.677 </td> <td style="text-align:right;"> 1.154 </td> <td style="text-align:right;"> 2.437 </td> <td style="text-align:right;"> 2.713 </td> <td style="text-align:right;"> 0.007 </td> </tr> </tbody> </table> **Adjusted** <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> exp(Est.) </th> <th style="text-align:right;"> 2.5% </th> <th style="text-align:right;"> 97.5% </th> <th style="text-align:right;"> z val. </th> <th style="text-align:right;"> p </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 2.461 </td> <td style="text-align:right;"> 1.676 </td> <td style="text-align:right;"> 3.612 </td> <td style="text-align:right;"> 4.598 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> exposure </td> <td style="text-align:right;"> 1.011 </td> <td style="text-align:right;"> 0.634 </td> <td style="text-align:right;"> 1.613 </td> <td style="text-align:right;"> 0.046 </td> <td style="text-align:right;"> 0.963 </td> </tr> <tr> <td style="text-align:left;"> as.factor(ses2)1 </td> <td style="text-align:right;"> 0.119 </td> <td style="text-align:right;"> 0.074 </td> <td style="text-align:right;"> 0.190 </td> <td style="text-align:right;"> -8.860 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> as.factor(ses2)2 </td> <td style="text-align:right;"> 0.168 </td> <td style="text-align:right;"> 0.097 </td> <td style="text-align:right;"> 0.290 </td> <td style="text-align:right;"> -6.399 </td> <td style="text-align:right;"> 0.000 </td> </tr> </tbody> </table> ] --- class: middle ##Which way will the confounding go? <img src="images/L6_dag_conf2.png" width="50%" /> `Vander Stoep A, et al. A didactic device for teaching epidemiology students how to anticipate the effect of a third factor on an exposure-outcome relation. AJE 1999; 15;150(2):221.` --- class: middle ##Which way will the confounding go? <img src="images/L6_dag_conf1.png" width="50%" /> **These schematics are just illustrations, it depends on the strength (degree of correlation) of the covariates!!, <span style="color:blue"> simulations works better than "blanket" type of statements </span>** --- ## Positive, negative, and “qualitative” confounding - Confounding may lead to an overestimation or an underestimation of the true magnitude of an effect. - **<span style="color:blue"> Positive confounding:</span>** the magnitude of the unadjusted vis-à- vis the adjusted association is exaggerated. - **<span style="color:blue"> Negative confounding:</span>** the magnitude of the unadjusted vis-à- vis the adjusted association is underestimated. - **<span style="color:blue"> Qualitative confounding:</span>** An extreme case when confounding results in an inversion of the direction of the association. --- class: middle ## Magnitude of confounding - The magnitude of confounding will depend on the strength of the confounder-exposure AND confounder-outcome associations. - Conversely, if there is no association between the confounder - exposure OR no association between the confounder-outcome then no confounding of the main effect could be present. - The strength of the confounder-exposure and confounder- outcome associations bounds the confounding effect - e.g., if RRcrude = 2 and the confounder-outcome relation is 2 (a doubling of risk), then the confounder would have to be perfectly correlated with the exposure in order to fully explain the main effect of RR=2 --- class: middle **How strong the the _unmeasured confounding_ should be to explain away my estimated association?** **<span style="color:blue"> E values:</span>** respond to this question for ratio ** `\(^1\)`** measures, how? $$E-value = RR + \sqrt{RR \times (RR-1)} $$ -- - E-value is the minimum value of the association between `\(U \to A\)` and `\(U \to Y\)` that will be capable of attenuating the observed association towards the null. - Example: RR=1.33; `\(1.33 + \sqrt{1.33 \times (1.33-1)} =\)` 1.99 then, if there was an `\(U\)`, it should: 1) double the risk among unexposed and/or exposed ( `\(RR_{UY} =2\)` ), AND 2) be twice as prevalent among exposed than among unexposed ( `\(RR_{AU} =2\)` ) To completely explain away the observed association, but a weaker confounder (given the E-value), say 1.5 or 1.3, would not. ** `\(^1\)`** <span style="color:blue"> E values are debatable for some but still a straightforward calculation and useful information to have. </span> [Versions of the E-value exists for ORs and HRs. E-value calculator.](https://www.evalue-calculator.com/) --- ###Statistical significance? ##In general, NO! - But if you MUST use p-values, set the criteria on the high side (e.g. p < 0.30). This way you adjust for some non-confounders, but you don’t miss many true confounders. `Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. AJE 1989;129(1):125-37.` - Residual confounding (unmeasured L’s (U1, U2, etc), categorization, measurement error, etc): `Kaufman JS, et al. Socioeconomic status and health in blacks and whites: the problem of residual confounding and the resiliency of race. Epidemiology 1997; 8(6):621-8.` `Ogburn EL, Vanderweele TJ. Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. Biometrika. 2013;100(1):241- 248. PMID: 24014285` --- ## Residual confounding Residual confounding occurs when adjustment does not completely remove the confounding effect of a given variable(s): **1) Misclassification of confounding variables** - (e.g., the variable is an imperfect proxy for the characteristic we want to adjust for) **2) Improper modeling of the confounding variable** - (e.g., if we are studying air pollution and lung cancer and want to control for smoking, we should measure smoking in a way that best predicts lung cancer—i.e., pack-years not ever-never) **3) Other important confounders are not included (also known as unmeasured confounding or omitted variable bias)** --- class: middle ## Validity and Bias: - The epidemiologist’s goal: the most **VALID and PRECISE** estimate possible of the causal effect of exposure on disease. - Error comes from sampling variability (lack of precision) and bias (lack of validity). <img src="images/L6validity.png" width="50%" /> --- class: middle ## Confounded `\(^1\)` ? <img src="images/L6confounded.png" width="50%" /> `\(^1\)` We all are!! We will have more on this and empirical examples after we deal with regressions. --- class: middle ### QUESTIONS? ## COMMENTS? # RECOMMENDATIONS? ---