Measures of Association

.title[
# Measures of Association
]
.author[
### Mabel Carabali
]
.institute[
### EBOH, McGill University
]
.date[
### Updated: 2025-09-09
]

---

## Expected Competencies

- Knows what are measures of frequency, association and effect.

- Know the difference between relative and absolute measures.

- Recognize and correctly interpret measures of association.

---
class: middle

## Objetives

• To clarify potential misconceptions about measures of association.

• To introduce the counterfactual and potential outcomes framework.

• To identify advantages and adequate use of different measures.

---
class: middle

<img src="images/L5_anaphylaxis.png" width="100%" style="display: block; margin: auto;" />
[Anaphylaxis: Clinical sciences. Osmosis from Elsevier](https://www.osmosis.org/learn/Anaphylaxis:_Clinical_sciences)

---
class: middle

[Allergic reactions during coronary angiography or PCI in Poland: Occurrence, trends, and long-term perspective based on the Polish ORPKI registry. Polish Heart Journal (Kardiologia Polska) Vol 83, No 3 (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)

---
class: middle
### What's an Association?

Our common objective of epidemiologic research:

- The effect of exposure ** `$X$` ** on the occurrence of outcome ** `$Y$`** 
   - But we can rarely observe or even estimate this effect directly.

– It involves the same people at the same time in contrasting exposures, which is .red[impossible].

– We **observe an association** between the _Exposure_ and _Outcome_ among study subjects, which estimates a population association.

_**"The observed association will be a poor substitute for the desired effect, if it is a poor estimate of the population association, or if the population association is not itself close to the effect of interest."**_ ME4 (2020)

---
class: middle
### What's an _Effect_ ?

Effect here means the end point of a causal mechanism, i.e., identifying the type of outcome that a cause produces.

EXAMPLE: _"Liver cirrhosis is an **effect** of chronic excessive alcohol consumption"._

- This use of the term effect **merely** identifies liver cirrhosis as **_one consequence of chronic excessive alcohol consumption_**. 
 - **Compared** to something else (e.g., Abstinence or another level of consumption).
 - Cirrhosis may **Not be the only** effect of of chronic excessive alcohol consumption. 
 - May **change** across populations and or over time.

.purple[ _"An effect of some factor is thus relative to the outcomes, to the population, and to the time frame."_ [ME4 (2020)]
]

---
class: middle
### Exposure vs Cause
An exposure (usually denoted as `$X$`) is a _potential_ causal characteristic, _"a factor that produces an outcome_"
- Could be the _sole_ or _compounded_ cause `$^1$` of an Outcome. Can be a behavior, a treatment/intervention, a social condition, a health condition, a genetic trait...
--

.small[
`$^1$` For more on [Sufficient Component Causal Framework and Bradford Hill criteria](https://ajph.aphapublications.org/doi/10.2105/AJPH.2004.059204), in this course we focus on [potential outcomes and causal DAGs](https://academic.oup.com/ije/article/31/5/1030/745818?login=false) for approaches to causal inference.]
.small[
- [Ellis A.K, et al. (2025)  Biphasic anaphylaxis in a Canadian tertiary care centre: an evaluation of incidence and risk factors from electronic health records and telephone interviews](https://doi.org/10.1186/s13223-024-00919-2)
]

---
class: middle
## Exposures vs Causes

.pull-left[
<img src="images/L5_anaphylaxis.png" width="100%" style="display: block; margin: auto;" />

[Anaphylaxis: Clinical sciences. Osmosis from Elsevier](https://www.osmosis.org/learn/Anaphylaxis:_Clinical_sciences)
]
--
.pull-right[
**Anaphylaxis Etiology**
> _"Common inciting sources may include **exposure** to certain medications, foods, or insect stings."_

> _"Occasionally, the offending agent is not identified; these reactions are idiopathic anaphylaxis."_

> _"The most common **causes** include bee stings, fire ant bites, latex, and foods (peanuts, tree nuts, fish, shellfish, milk, eggs, wheat, spelt, rye, barley, soy, red meat, and sesame)."_

[Anaphylaxis StatPearls NLM](https://www.ncbi.nlm.nih.gov/books/NBK482124/)
]
---
class: middle
### Exposures vs Causes
.pull-left[
<img src="images/allergic_PCI_polland.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="images/L5_PCI_polandtable3.png" width="100%" style="display: block; margin: auto;" />
]

.small[
[Allergic reactions during coronary angiography or PCI in Poland: Occurrence, trends, and long-term perspective based on the Polish ORPKI registry. Polish Heart Journal (Kardiologia Polska)Vol 83, No 3 (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)
]

---

class: middle
#### Not every Association between Exposure an Outcome is _"Causal"_

**.red[Recall:]**

At the population level, we assess the effects with measures of occurrence and we **estimate the associations by contrasting such measures of occurrence in the population.**

---
class: middle
## Absolute vs. Relative measures
- Absolute effect measures are **differences** in occurrence measures.
- Relative effect measures are **ratios** of occurrence measures.

---

Absence of contrast on either the Absolute (Difference) and Relative (Ratio) Scales

** `$$R_{exp} - R_{Non-exp} =  0$$`**

and

** `$$\left(\frac{R_{exp}} {R_{Non-exp}}\right) = 1$$`**

---

class: middle
## Absolute vs. Relative measures
- Absolute effect measures are **differences** in occurrence measures.
- Relative effect measures are **ratios** of occurrence measures.

|Sample      | Outcome +   |  Outcome -  | Risk Among Exposed | Risk Among Non- Exposed | Risk Difference | Risk Ratio    |
|:-----------|:-----------:|:-----------:|:------------------:|:-----------------------:|:---------------:|:-------------:|
| 63 | 9/36 | 6/27| 0.25 | 0.22 | 0.03 | 1.12|

**Key elements:** Exposure, Outcome & Measures of Occurrence !

---
class: middle
### The 2x2 Table

- A summary table of observations.

<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
<tr>
<th style="empty-cells: hide;" colspan="1"></th>
<th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Outcome</div></th>
<th style="empty-cells: hide;" colspan="1"></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;"> Outcome </th>
   <th style="text-align:left;"> No.Outcome </th>
   <th style="text-align:left;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Exposed </td>
   <td style="text-align:left;"> A </td>
   <td style="text-align:left;"> B </td>
   <td style="text-align:left;"> A+B </td>
  </tr>
  <tr>
   <td style="text-align:left;"> No Exposed </td>
   <td style="text-align:left;"> C </td>
   <td style="text-align:left;"> D </td>
   <td style="text-align:left;"> C+D </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:left;"> A+C </td>
   <td style="text-align:left;"> B+D </td>
   <td style="text-align:left;"> A+B+C+D </td>
  </tr>
</tbody>
</table>
<br>

---
class: middle
### The 2x2 Table

From the previous example:

.small[
<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
<tr>
<th style="empty-cells: hide;" colspan="1"></th>
<th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Outcome</div></th>
<th style="empty-cells: hide;" colspan="1"></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Dead </th>
   <th style="text-align:right;"> Alive </th>
   <th style="text-align:right;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Exposed </td>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 27 </td>
   <td style="text-align:right;"> 36 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> No Exposed </td>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 18 </td>
   <td style="text-align:right;"> 27 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 18 </td>
   <td style="text-align:right;"> 45 </td>
   <td style="text-align:right;"> 63 </td>
  </tr>
</tbody>
</table>
]
<br>
--

**The measures are:**

**Key elements:** Exposure, Outcome & Measures of Occurrence !

---
class: middle
## Absolute measures: Risk Differences

The RD provides the absolute change in risk
- Indicates how much of the effect is attributable to exposure
- It does not provide information about the magnitude of the shift on the estimates  1 to 16% or 81 to 96%?
- Clinical vs statistical importance
- Public Health Relevance

---
class: middle
## Relative Measures: Risk Ratios

- Relative measures are popular  and practical
- Easier to obtain
 - Dichotomous outcomes!
- Useful in both causal inference and prediction

**Interpretability?**

- RR’s magnitude change according to the coding scheme

---
class: middle
## Why not both?
- It’s possible to see a reduction in absolute estimates, but an increase in relative measures (and vice versa)
 
These are complimentary estimators! Both tell you something different about the data
 - In fact, STROBE and CONSORT guidelines now advise researchers to publish both measures

### <span style="color:darkmagenta">695 reactions vs 482 = 29% increase vs 20 more cases per 1000 people?</span>

In an epidemiological utopia, researchers would run the model of their choice, obtain relative and absolute estimates, and publish these along with the baseline/background risk

**But we live in the real world, so we’re more likely to encounter…**

---
class: middle
## ...Odds ratios

`$Odds = \left(\frac{P}{1-P}\right)$`
 
 The Odds ratio is the relative contrasts of the Odds among Exposed and the Odds among Non-Exposed
 
  `$$Odds Ratio = \left(\frac{\left(\frac{P_{exp}}{1-P_{exp}}\right)}  {\left(\frac{P_{No-exp}}{1-P_{No-exp}}\right)}\right)$$`
<br>
 
---
class: middle
## ...Odds ratios

We are already aware of some key problems with odds and therefore odds ratios

- They overestimate risks 
 - While probabilities are bounded [0, 1], **odds** can range from 0 to `$\infty$`
- .red[They’re not intuitive (except as an approximation of the risk ratio] `$^1$`)
- And most of the time we care about probabilities, not odds

<br>
`$^1$` .small[ **When probability is small (<0.10) or given the study design (e.g., case-cohorts) with rare outcomes** ]
 
---
class: middle
## Risk Differences and Risk Ratios

|Sample      | Risk Among Exposed | Risk Among Non- Exposed | Risk Difference | Risk Ratio    | Odds Ratio    |
|:-----------|:------------------:|:-----------------------:|:---------------:|:-------------:|:-------------:|
| 63 |  0.25 | 0.22 | 0.03 | 1.12| 1.17|
| 63 |  0.17 | 0.15 | 0.02 | 1.12| 1.15|
| 630|  0.017 | 0.015 | 0.002 | 1.12| 1.15|
| 630|  0.25 | 0.22 | 0.03 | 1.12| 1.17|

---

Absence of contrast on either the Absolute (Difference) and Relative (Ratio) Scales

`$R_{exp} - R_{Non-exp} =$` ** `$0$`** ; or `$\left(\frac{R_{exp}} {R_{Non-exp}}\right) =$` ** `$1$`**

Example:

|Sample      | Outcome +   |  Outcome -  | Risk Exposed | Risk  Non- Exposed | Risk Difference | Risk Ratio  | Odds Ratio*   |
|:-----------|:-----------:|:-----------:|:------------:|:------------------:|:---------------:|:-----------:|:-------------:|
| 63 | 4/36 | 3/27| 0.11 | 0.11 | 0 | 1| 1 |

- Null Value (Absolute) = (4/36) - (3/27) = 0
- Null Value (Relative) = (4/36) / (3/27) = 1

**Note**
1. When the RR is above or below the null (>1 or <1) the ORs is FURTHER away form the Null.
2. When the absolute risk in each exposure groups are high, the OR will considerable overestimate the RR.

---
class: middle
### Some simulated Examples

- Generating 10 data-sets with the _same_ structure.

``` r
set.seed(7042025)
z <- rnorm(500)
e <- matrix(NA,nrow=500,ncol=10) # create an empty matrix to put stuff in
for (i in 1:10) {             # loop 10 times
    e[,i] <- ifelse((rnorm(500))>0.8,1,0) # create a new vector from a binomial
}
e <- as.data.frame(e) # change it into a data frame
names(e) <- c("A","B","C","D","E","F","G","H","I","J") # change the column names

# create a Y matrix with the specification from the questions
y <- ifelse(e==1,rbinom(5000,1,0.65),rbinom(5000,1,0.2))
y <- as.data.frame(y) # change the matrix to a dataframe
# change the names, paste0 says paste these two things together without a space between them
names(y) <- paste0("Y",letters[1:10])
```
<br>

---

### Some simulated Examples

Verification of "mean" values across datasets.

```
##        A               B               C              D               E       
##  Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.00  
##  1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.00   1st Qu.:0.000   1st Qu.:0.00  
##  Median :0.000   Median :0.000   Median :0.00   Median :0.000   Median :0.00  
##  Mean   :0.194   Mean   :0.206   Mean   :0.21   Mean   :0.188   Mean   :0.23  
##  3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.00   3rd Qu.:0.000   3rd Qu.:0.00  
##  Max.   :1.000   Max.   :1.000   Max.   :1.00   Max.   :1.000   Max.   :1.00  
##        F               G             H               I               J        
##  Min.   :0.000   Min.   :0.0   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.000   1st Qu.:0.0   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000  
##  Median :0.000   Median :0.0   Median :0.000   Median :0.000   Median :0.000  
##  Mean   :0.232   Mean   :0.2   Mean   :0.164   Mean   :0.216   Mean   :0.218  
##  3rd Qu.:0.000   3rd Qu.:0.0   3rd Qu.:0.000   3rd Qu.:0.000   3rd Qu.:0.000  
##  Max.   :1.000   Max.   :1.0   Max.   :1.000   Max.   :1.000   Max.   :1.000
```

---

### Some simulated Examples

**Using "hand calculations" formulas.**

``` r
# Using lapply
tabs <- lapply(1:10,FUN=function(x) table(e[,x],y[,x]))
# With a loop create an empty data frame with the right dimensions
ests <- as.data.frame(matrix(NA,nrow=10,ncol=3))
# Name columns
names(ests) <- c("RD","RR","OR")

# Loop it up
for (i in 1:10) {
x <- tabs[[i]]
ests[i,1] <- x[2,2]/sum(x[2,])- x[1,2]/sum(x[1,]) #RD
ests[i,2] <- (x[2,2]/sum(x[2,]))/ (x[1,2]/sum(x[1,])) #RR
ests[i,3] <- (x[2,2]/sum(x[2,1]))/ (x[1,2]/sum(x[1,1])) #OR
}
```

---
class: middle
### Some simulated Examples
.pull-left[
<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:right;"> RD </th>
   <th style="text-align:right;"> RR </th>
   <th style="text-align:right;"> OR </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.43 </td>
   <td style="text-align:right;"> 3.70 </td>
   <td style="text-align:right;"> 7.55 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.36 </td>
   <td style="text-align:right;"> 2.71 </td>
   <td style="text-align:right;"> 5.00 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.43 </td>
   <td style="text-align:right;"> 3.35 </td>
   <td style="text-align:right;"> 7.17 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.50 </td>
   <td style="text-align:right;"> 3.52 </td>
   <td style="text-align:right;"> 9.46 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.47 </td>
   <td style="text-align:right;"> 3.54 </td>
   <td style="text-align:right;"> 8.29 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.45 </td>
   <td style="text-align:right;"> 3.36 </td>
   <td style="text-align:right;"> 7.66 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.49 </td>
   <td style="text-align:right;"> 3.41 </td>
   <td style="text-align:right;"> 8.77 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.43 </td>
   <td style="text-align:right;"> 2.74 </td>
   <td style="text-align:right;"> 6.50 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.52 </td>
   <td style="text-align:right;"> 3.73 </td>
   <td style="text-align:right;"> 10.50 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.43 </td>
   <td style="text-align:right;"> 3.17 </td>
   <td style="text-align:right;"> 6.76 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
<img src="L5_EPIB704_Association_25_files/figure-html/unnamed-chunk-16-1.svg" width="90%" style="display: block; margin: auto;" />
]

<br>

---

### Some simulated Examples

**Using regressions `$^1$` to obtain the estimates (_Rare-ish_ outcomes)**
.small[

``` r
set.seed(7042025)
yea.dat <- function(n) {
  		E <- rbinom(n,1,0.55) #parameters for E
		  Y <- rbinom(n,1,0.12) #parameters for Y

return(data.frame(E=E,Y=Y)) #ask to return a data set with those parameters
}
sim100 <- lapply(1:100,FUN=function(x) yea.dat(400))
summary((sim100[[13]]))  #; summary((sim100[[93]]))
```

```
##        E                Y       
##  Min.   :0.0000   Min.   :0.00  
##  1st Qu.:0.0000   1st Qu.:0.00  
##  Median :1.0000   Median :0.00  
##  Mean   :0.5025   Mean   :0.11  
##  3rd Qu.:1.0000   3rd Qu.:0.00  
##  Max.   :1.0000   Max.   :1.00
```
]

`$^1$` .small[ _Some of you may have advanced knowledge on regression analysis but since we have not explained it during the course this resource is only for illustration purposes._]

---
class: middle
### Some simulated Examples

``` r
RRs <- sapply(sim100,FUN=function(x) {
   results <- logbin(Y ~ E , data=x)$coef
   return(round(exp(results[names(results)=="E"]),2)) 
 })

ORs <- sapply(sim100,FUN=function(x) {
   results <- glm(Y ~ E , family="binomial", data=x)$coef
   return(round(exp(results[names(results)=="E"]),2)) 
 })

sim_RRs<-round(quantile(RRs,probs = c(0.05,0.5,0.95)),2)
sim_ORs<-round(quantile(ORs,probs = c(0.05,0.5,0.95)),2)
```

---
class: middle
### Some simulated Examples

```
##           5%  50%  95%
## sim_RRs 0.63 1.00 1.80
## sim_ORs 0.59 1.02 1.93
```

---
class: middle
## Associations Vs Causes

Pearl (2000) uses ** `$Pr(Y=y|SET[X=x])$`** to define the probability of an event if the condition `$X=x$` were **enforced uniformly** over a population.
 
– The key to this definition: **it involves intervention, not observation.**

– Measures of effect can be built based on **SET** notation by creating contrasts of probabilities (or risks) across different `$X$` values.

– The [What If? Book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/), express the same notion with `$Y^{X=x}$`.

---

## Associations Vs Causes

**Recall:**

**Be aware of the difference!**

---
class: middle
### Measures of causal effect

Measures of **causal effect** require a contrast of two **counterfactual** quantities:

- ** `$Pr(Y_i = y_i | SET[X_i = 1]) - Pr(Y_i = y_i | SET[X_i=0])$`** 
<br>

<br>

Measures of **association** involve a contrast of two **observed** quantities:

- ** `$Pr(Y_i = y_i | X_i = 1) -  Pr(Y_i = y_i | X_i = 0)$`**

---

## Who are we interested in?

**Target Population:** The group of people about which the scientific or public health question is asked, in the **relevant etiologic time period.**

<br>

**“Target Population” in Encyclopedia of Biostatistics, 2005 [Sander Greenland]**

> _"The concept of a target population is an informal one, sometimes defined as “the population about which information is wanted” [1] or the “totality of elements which are under discussion and about which information is desired” [4] ...The word “target” emphasizes, however, that this population is not necessarily the same as the one that we end up sampling. The latter population is sometimes called the sampled population [1, 4] or (in epidemiology) the source population [6]"_.
> _[1] Cochran, W.G. (1977). Sampling Techniques, 3rd Ed. Wiley, New York._
> _[4] Mood, A.M., et al. (1974). Introduction to the Theory of Statistics. McGraw-Hill,       New York._
> _[6] Rothman, K.J. & Greenland, S. (1997). Modern Epidemiology, 2nd Ed. Lippincott, Philadelphia._

<br>   
    
---
class: middle
## Potential Outcomes Framework

We are interested in the effect of exposure ** `$(A=1)$`** on the occurrence of disease ** `$(Y=1)$`**

- Suppose everyone in the **target population** of inference is unexposed ** `$(A=0)$`** and we can observe the distribution of ** `$Y$`** in the population.
  
- We would like to also observe the distribution of ** `$Y$`** had these same people been all exposed ** `$(A=1)$`**

- This is “counter-to-fact”, and we call this condition the counterfactual
 - Each individual has their own **counterfactual** exposure 
   - (what would have happened to me if ...)

---
## Potential Outcomes Framework

- We can never observe both conditions in the same population (or individual). 
  - That is, we cannot observe the distribution of disease under ** `$A=1$`** and ** `$A=0$`** within the same time period in the same cohort. 
  - Thus we need to make an estimate under the condition we do not observe.

- To do so, we use a **substitute population.** 
- Our goal is to choose a substitute population that will best mimic **what would have happened to the target population had they experienced the other exposure condition.**

---
###  Estimating Causal Effects `$^1$`

<img src="images/L5_MG1.jpeg" width="30%" style="display: block; margin: auto;" />
<br>
- If `$B$` = people at risk at the _beginning of the period_ `$^2$` = **incidence proportion, average risk**.
- If `$B$` = person-time at risk during the period, `$R$` = **person-time incidence rate**. 
- If `$B$` = people who do _NOT_ get disease by the end of the period, `$R$` = **incidence odds.**

<br>
`$^1$` _International Journal of Epidemiology, Volume 31, Issue 2, April 2002, Pages 422–429, https://doi.org/10.1093/intjepid/31.2.422_

`$^2$` _And all individuals are followed throughout the etiologic time period._

---

##Define the causal effect

Define the counter-to-fact condition and outcome in the target:

**Only possible to, at best, observe one of these conditions.**

---

##Define the causal effect

Let `$R_1 = A_1 / B_1$` and `$R_0 = A_0 / B_0$`

- `$R_1 - R_0$` is the causal **difference measure** and

- `$R_1 / R_0$` is the **causal ratio measure**

- Both of these are causal contrasts (measures of effect).

- Here, the only possible reason for a difference between `$R_1$` and `$R_0$` is due to **exposure** because we are contrasting the **exact same people over the exact same time period**.

- But, we **cannot observe the causal contrast** because we cannot observe both conditions.

**We require a substitute with observable information!**

---

##Can’t observe the counterfactual

.pull-left[
<img src="images/L5_MG3.jpeg" width="80%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="images/L5_MG4.jpeg" width="80%" style="display: block; margin: auto;" />
]

---

##Can’t observe the counterfactual

.pull-left[
<img src="images/L5_MG3.jpeg" width="70%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="images/L5_MG4.jpeg" width="70%" style="display: block; margin: auto;" />
]
<img src="images/L5_MG5.jpeg" width="45%" style="display: block; margin: auto;" />

---

##Can’t observe the counterfactual

> "_Both `$R_1$` and `$R_0$` are counterfactual disease frequencies,both are hypothetical alternatives to the actual disease frequency that occurs under the actual exposure distribution (which is neither exposure distribution 1 nor 0), and therefore neither R1 nor R0 can occur and be observed._" 
[Maldonado & Greenland 2002](https://doi.org/10.1093/intjepid/31.2.422)

---

##Can’t observe the counterfactual

.pull-left-narrow[
<img src="images/L5_MG3.jpeg" width="30%" style="display: block; margin: auto;" />
]

The impossibility of observing both halves of the causal contrast leads to the idea of substitute populations.

These are often:
+ Different people observed during the same etiologic time period.
+ The same people observed over two different time periods (case-crossover design).

[Maldonado & Greenland 2002](https://doi.org/10.1093/intjepid/31.2.422)
---

##Defining the substitute population

In a substitute population under **exposure distribution** ** `$1$`**, let

- `$C_1$` be the name for the numerator of the disease-frequency measure, 
- `$D_1$` be the denominator (number of people or amount of person-time at risk).

In a substitute under **exposure distribution** ** `$0$`**, let

- `$E_0$` be the numerator, 
- `$F_0$` be the denominator.

**Exposure = 1**, `$C_1 /D_1$`

**Exposure = 0**, `$E_0 / F_0$`

---

##Defining the substitute population

Target experiences **exposure distribution 1**

<img src="images/L5_MG7.jpeg" width="60%" style="display: block; margin: auto;" />
<br>
--

[Maldonado & Greenland 2002](https://doi.org/10.1093/intjepid/31.2.422)
---

##Defining the substitute population 
Target experiences **exposure distribution 0**

<img src="images/L5_MG8.jpeg" width="60%" style="display: block; margin: auto;" />
<br>
--

[Maldonado & Greenland 2002](https://doi.org/10.1093/intjepid/31.2.422)
---
##Defining the substitute population
Target experiences neither **exposure distribution 1 or 0**

<img src="images/L5_MG9.jpeg" width="60%" style="display: block; margin: auto;" />
<br>
--

[Maldonado & Greenland 2002](https://doi.org/10.1093/intjepid/31.2.422)
---

## Other Notation Used

- Greenland employs a probabilistic model of disease such that each individual `$i$` has a risk `$r_{1_i}$` of disease when `$E=1$` and a risk of `$r_{0_i}$` when `$E=0$`.

- Survival probabilities: `$S_{1_i} = 1 - r_{1_i}$` and `$S_{0_i} = 1 - r_{0_i}$`

- Odds: `$w_{1_i}=r{1_i}/s_{1_i}$` and `$w_{0_i} = r_{0_i} / s_{0_i}$`. 
  - Only defined when survival probabilities are not equal to zero.

---

##Notation Used

- The effect of exposure on the risk of an individual can be measured in terms of the risk difference `$r_{1_i} - r_{0_i}$`, risk ratio `$r_{1_i} / r_{0_i}$`, or the risk-odds ratio `$w_{1_i} / w_{0_i}$`

- The ratios will be undefined if the risk in the exposed group is 0 and
- The risk-odds ratio will be undefined if either survival probability is 0

---

## Other Notation Used 
In a cohort with `$N_1$` `$E+$` individuals and `$N_0$` `$E-$` individuals:

|          |     E+     |      E-     |
|:--------:|:----------:|:-----------:|
|**D+** | `$A={Σ_1}r_{1i}$`  | `$B={Σ_0}r_{0i}$` |
|**D-** | `$C={Σ_1}s_{1i}$` | `$D={Σ_0}s_{0i}$` | 
|Total  | `$N_1$`  | `$N_0$`|

- Incidence proportions: `$A/{N_1}$` and `$B/N_0$`, interpretable as average risks in their respected groups
- Incidence odds: `$A/C$` and `$B/D$`, interpretable as ratios of the average risk to the average survival probabilities.

---

##Defining the counterfactual

- Assuming **no confounding** ** `$^1$`**

- Had the **exposure** been absent from the `$E+$` group,

- The average risk would have been **the same** among the sub-cohorts that were in fact exposed and unexposed.

`$\left(\frac{\sum_{1}r_{0i}}{N_1}\right) = \left(\frac{\sum_{0}r_{0i}}{N_0}\right)$`

<br>
`$^1$` More on confounding next lecture.

---

Thus: The risk difference is interpretable as both:

1) the **absolute change in the average risk** of the exposed sub-cohort produced by exposure,

`$\left(\frac{\sum_{1}r_{1i}}{N_1}\right) - \left(\frac{\sum_{1}r_{0i}}{N_1}\right)$`

<br>

2) the **average absolute change in risk** produced by exposure among exposure individuals

`$\left(\frac{\sum_{1}(r_{1i} - r_{0i})}{N_1}\right)$`

[Expressions 1 and 2, on: Interpretation and choice of effect measures in epidemiologic analyses. S. Greenland (1987)](https://doi.org/10.1093/oxfordjournals.aje.a114593)

---
class: middle
### Risk Ratio
The incidence proportion ratio is given by

`$\left(\frac{A}{N_1}\right) / \left(\frac{B}{N_0}\right)$`

The risk ratio is interpretable as :

1) the **proportionate change in the average risk** of the exposed subcohort produced by exposure,

`$\left(\frac{\sum_{1}r_{1i}}{N_1}\right) / \left(\frac{\sum_{0}r_{0i}}{N_0}\right)$` `$=\left(\frac{\sum_{1}r_{1i}}{N_1}\right) / \left(\frac{\sum_{1}r_{0i}}{N_1}\right)$`

It **is not interpretable** as the average proportionate change in risk produced by exposure among exposed individuals:

`$\left(\frac{\sum_{1}(r_{1i}/ r_{0i})}{N_1}\right)$`

[Expressions 3 and 4, on: Interpretation and choice of effect measures in epidemiologic analyses. S. Greenland (1987)](https://doi.org/10.1093/oxfordjournals.aje.a114593)

---
class: middle
## Incidence Proportion Ratio

However, **if the individual risk ratios are all equal** then the ratio of the average risks across exposure will be equal to the average of the individual risk ratios

---
class: middle
### A note on Risk Ratios

Risk Differences have a symmetric range [-1, 1]

**But** the risk Ratios have an **_asymmetric_** range:
- From 0 to 1, below the null
- From 1 to infinity above the null

This presents a challenge in the interpretation...
 
** .red[What's more "impressive" `$RR = 2$` or  `$RR= 0.2$` ?]**
...

---

Two ways to find out:

**Simple:** the reciprocal of the value below the null = 1/0.2 = 5, since 5 > 2, then 
 - a RR of 0.2 is of larger magnitude (further away form the null) than a RR = 2

**Elaborated:** Take the absolute values of the natural logarithm (log or ln) of each value:
 - log(2) = 0.693
 - log(0.2) = -1.609
 
In absolute terms, |log(0.2)| **>** |log(2)| = **|-1.609| > |0.693|**

- Regression models for ratio measures generally operate on the log-scale (that's why we exponentiate to provide estimates and graph on the log scale).
 - Note: Recall that on the log10 or ln scale, the null for a ratio measure is 0, not 1 – because log(1) = ln(1) = 0.
 - .purple[Try it with 3 and 0.3, and with 5 and 0.5 and see what happens! :)]

---

### Incidence Odds Ratio

The incidence odds ratio is given by:

`$\left(\frac{A}{C}\right) / \left(\frac{B}{D}\right)$`

Thus, the odds ratio is interpretable as :

1) the proportionate change in the incidence odds in the exposed subcohort produced by exposure,

`$\left(\frac{\sum_{1}r_{1i}}{\sum_{1}s_{1i}}\right) / \left(\frac{\sum_{0}r_{0i}}{\sum_{0}s_{0i}}\right)$`

`$\left(\frac{\sum_{1}r_{1i}}{\sum_{1}s_{1i}}\right) / \left(\frac{\sum_{1}r_{0i}}{\sum_{1}s_{0i}}\right)$`

<br>
It **is not interpretable** as the proportionate change in the average odds in the exposed produced by exposure:

---
class: middle
##Incidence Odds Ratio

- Furthermore,neither of the last two expressions is equivalent to the average of the individual odds ratios among the exposed

`$$\left(\frac{\sum_1(w_{1_i} / w_{0_i})}{N_1}\right)$$`

- The incidence odds ratio **(that we calculate)** lacks any simple interpretation in terms of exposure effect on the average risk or odds, or average exposure effect on individual risk or odds.
 
---
class: middle

### Incidence Odds Ratio

The incidence odds do not equal the simple averages of the risk odds:

This severely handicaps the interpretability of measures based on the incidence odds.
 
- It is not a measure of average causal effect (the RR and RD are) (Greenland 1987)

- Cannot be relied upon to reveal confounding (Greenland et al., 1999)

---
class: middle
##Incidence Odds Ratio

- If the individual ORs are all equal (which is the assumption made by a logistic model), then the ratio of the average odds will equal the average of the individual odds ratios.

**But, the incidence odds ratio will need not equal that value!**

---
class: middle
##Incidence Odds Ratio

- For example: Define a population where 10% of people have `$r_{1i}=$` 0.60 and

- `$r_{0i}=$` 0.20 and 90% of the people have `$r_{1i}=$` 0.035 and `$r_{0i}=$` 0.006

- Here, the individual ORs = 6.0 for every individual 
 - the average of the individual ORs = 6.0

- Also, the ratio of the average odds equals 6.0 as well

But, the incidence odds ratio is equal to 3.9

**<span style="color:magenta"> Want to give it a try and calculate it? </span>**

---
class: middle
##Incidence Odds Ratio

* Because of this fact, the crude odds ratio can be smaller than any of the stratum-specific odds ratios, even if confounding is entirely absent

* This paradoxical behaviour will not occur with the risk difference or the risk ratio
--

* Unless, equal to the null (when OR = RR = 1) the OR will _**almost always**_ be further away from the null than RRs.
  - 2nd exception is when **OR = RR = 0**, as would occur when risk in the exposed is 0 (zero), and risk in the unexposed is for example 0.6. In this case, the risk ratio is 0 and the OR is odds(0)/odds(0.6), which is also 0.

More on [Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---
class: middle

## .red[Odds difference?]

<br>
--

<br>
  ⚠️  ‼️ ** .red[Nope, nope, nope!!!!!!!]** ⚠️ ‼️

<br>

❌ ‼️ ** .red[We never do this.]** ‼️ ❌

<br>

---
class: middle
### Some simulated Examples - Common Outcome

``` r
set.seed(7042025)
yea.dat1 <- function(n) {
  		E <- rbinom(n,1,0.55) #parameters for E
		  Y <- ifelse(E==1, rbinom(n,1,0.85), rbinom(n, 1, 0.45)) #parameters for Y

return(data.frame(E=E,Y=Y)) #ask to return a data set with those parameters
}
sim100 <- lapply(1:100,FUN=function(x) yea.dat1(400))
RRs <- sapply(sim100,FUN=function(x) {
   results <- logbin(Y ~ E , data=x)$coef
   return(round(exp(results[names(results)=="E"]),2)) 
 })

ORs <- sapply(sim100,FUN=function(x) {
   results <- glm(Y ~ E , family="binomial", data=x)$coef
   return(round(exp(results[names(results)=="E"]),2)) 
 })

sim_RRs1<-round(quantile(RRs,probs = c(0.05,0.5,0.95)),2)
sim_ORs1<-round(quantile(ORs,probs = c(0.05,0.5,0.95)),2)
```
]

---
class: middle
### Some simulated Examples - Common Outcome

```
##            5%  50%   95%
## sim_RRs1 1.65 1.90  2.22
## sim_ORs1 4.73 7.12 10.52
```

---
class: middle
# Key Takeaways

#### Only incidence difference and incidence ratios possess direct interpretations as measures of impact on average risk/hazard

<br>

#### Consequently, odds ratios are useful only when they serve as incidence ratio estimates

---

class: middle
### Incidence Rate Differences and Ratio 
.pull-left[
**From everyone's time...**
<img src="L5_EPIB704_Association_25_files/figure-html/unnamed-chunk-39-1.svg" width="80%" style="display: block; margin: auto;" />
]

.pull-right[
**... To contrast outcomes and time**
<img src="L5_EPIB704_Association_25_files/figure-html/unnamed-chunk-40-1.svg" width="80%" style="display: block; margin: auto;" />

]

---

class: middle
### Incidence Rate Differences and Ratio 
- Incidence rate of outcome Y when X=1 is `$IR(Y=1|X=1) = IR(Y|X=1)$`

- Incidence rate of outcome Y when X=0 is `$IR(Y=1|X=0) = IR(Y|X=0)$`

---

> "_Extrapolating data from the study sites, from 2015 to 2022, the incidence rate of childhood anaphylaxis emergency visits in Singapore doubled from 18.9 to 38.8 per 100,000 person-years, with an **incidence rate ratio (IRR) of 2.06 (95% CI: 1.70–2.49)**. In 2022, the incidence rate of food anaphylaxis was 30.1 per 100,000 person-years, IRR 2.39 (95% CI 1.90–3.01) and drug anaphylaxis was 4.6 per 100,000 person-years, IRR 1.89 (95% CI 1.11–3.25)._"

<img src="images/L4_rates_singapore0.jpg" width="60%" style="display: block; margin: auto;" />
[Trends in Childhood Anaphylaxis in Singapore: 2015–2022](https://doi.org/10.1111/cea.14528)

---

Question: If we calculated the risk ratio, odds ratio and rate ratio, which will be closest to the null? Furthest from the null?

Notation:
- `$R$` = Incidence Proportion (“Risk”), 
- `$S = 1-R$` (“Survival Proportion”),
- `$I$` = Incidence Rate
- `$T$` = interval length
- `$i$` = 1 if exposed; i= 0 if unexposed

---
class: middle
### Relationship among Risk, Odds, and Incidence Rates

Relations among relative risks
* In a closed population where the population at risk declines only slightly over the interval 
(implying that `$R$` must be small and `$S$` is close to 1): `$R \cong I\Delta T \cong R/S$`

This implies:

`$$\left(\frac{R_1}{R_0}\right) \cong \left(\frac{I_1\Delta T_1}{I_0\Delta T_0}\right) \cong \left(\frac{I_1}{I_0}\right) \cong \left(\frac{R_/S_1}{R_0/S_0}\right)$$`

<span style="color:darkred">[Numerators]</span> Holds if `$R_1$` and `$R_0$` are small enough so that `$S_1$` and `$S_0$` are close to 1
 
 <span style="color:darkred">[Two Denominators on the right]</span> Holds if exposure only has negligible effects on the person-time at risk 
 
---
class: middle
## Relationship among relative risks

- If exposure causes the outcome, then R1>R0 and S1<S0.

`$1 < \left(\frac{R_1}{R_0}\right) < (\left(\frac{R_1}{R_0}\right) \times \left(\frac{S_0}{S_1}\right)) = \left(\frac{R_1/S_1}{R_0/S_0}\right)$`

- If exposure prevents the outcome, `$R1<R0$` and `$S1>S0$`, such that:
`$1 > \left(\frac{R_1}{R_0}\right) > (\left(\frac{R_1}{R_0}\right) \times \left(\frac{S_0}{S_1}\right)) = \left(\frac{R_1/S_1}{R_0/S_0}\right)$`

In words: **The odds ratio is further from the null than the risk ratio**

---
class: middle
## Relationship among relative risks

- Now, if exposure is harmful `$(R1>R0)$` then we would ordinarily expect exposure to reduce the person-time at risk `$(T1<T0)$`, 
- and if exposure is preventive `$(R1<R0)$` then we expect exposure to increase the person-time at risk `$(T1>T0)$`.
- Thus, when exposure is .red[harmful]:

`$$1 < \left(\frac{R_1}{R_0}\right) \cong \left(\frac{I_1\Delta T_1}{I_0\Delta T_0}\right) < \left(\frac{I_1}{I_0}\right)$$`

---

class: middle
## Relationship among relative risks
And when exposure is .blue[preventive]:

`$$1 > \left(\frac{R_1}{R_0}\right) \cong \left(\frac{I_1\Delta T_1}{I_0\Delta T_0}\right) > \left(\frac{I_1}{I_0}\right)$$`
- In words: We would **ordinarily** expect the risk ratio to be closer to the null than the rate ratio. Under further conditions, the rate ratio will be closer to the null than the odds ratio (Greenland and Thomas, 1982)

---
class: middle
##Relationship among relative risks

Thus, we usually expect:
- .blue[Risk ratio nearest to the null]
 - _implicitly_ suggesting all events occur at the end of follow up
- .red[Odds ratio furthest from the null]
 - _implicitly_  suggesting all events occur at the beginning of follow up
- .purple[Rate ratio somewhere in between]
 - allows event to occur at any point in time

**1 < Risk Ratio < Rate Ratio < Odds Ratio**

`$^1$` More on [Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---
class: middle
### Prevalence Ratios
Recall that the prevalence odds is equal to the incidence rate multiplied by the average duration in a stationary, closed population. 
This implies:

`$$POR = \left(\frac{PO_1}{PO_0}\right) = \left(\frac{I_1 \overline{D_1}}{I_0 \overline{D_0}}\right) = \left(\frac{I_1}{I_0}\right)$$`

if the average duration of disease is unaffected by exposure.

---

class: middle
##Prevalence Ratios
- If prevalence is low, then the prevalence ratio is approximately equal to the incidence rate multiplied by average duration. This implies:

`$$PR = \left(\frac{P_1}{P_0}\right) \cong \left(\frac{I_1 \overline{D_1}}{I_0 \overline{D_0}}\right) = \left(\frac{I_1}{I_0}\right)$$`

if the average duration of disease is unaffected by exposure

---
class: middle
**...Still breathing?**

⚠️ **.red[WARNING: SENSITIVE IMAGE]** ⚠️ 🙅

**.red[Small reaction]**

[The Big Bang Theory Gifs](https://fyeahbigbangtheorygifs.tumblr.com/post/9854726239)

---
class: middle

**Key Points**
 
> - _"There was a decreasing incidence of allergic reactions overall, in both sexes and all age subgroups. The decrease in frequency did not depend on age or sex."_
> - _"Our multivariable logistic regression analysis showed that GPI IIb/IIIa, LMWH, previous PCI, contrast dose, and x-ray dose were independent predictors of allergic reaction occurrence". _

`$^1$` [Allergic reactions during coronary angiography or PCI in Poland: Occurrence, trends, and long-term perspective based on the Polish ORPKI registry. Polish Heart Journal (Kardiologia Polska)Vol 83, No 3 (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)

---
Class: middle
**Illustrated Example:**
<img src="images/L5_PCI_polandtable3.png" width="50%" style="display: block; margin: auto;" />

`$^1$` [Allergic reactions during coronary angiography or PCI in Poland (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)

---
class: middle
**Illustrated Example... Let's reproduce the analysis**
.pull-left[
<img src="images/allergic_PCI_polland2.png" width="120%" style="display: block; margin: auto;" />
]
--
.pull-right[
<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
<tr>
<th style="empty-cells: hide;" colspan="1"></th>
<th style="padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #00000020; padding-bottom: 5px; ">Outcome</div></th>
<th style="empty-cells: hide;" colspan="1"></th>
</tr>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Anaphylaxis </th>
   <th style="text-align:right;"> Non.Anaphylaxis </th>
   <th style="text-align:right;"> TOTAL </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Male </td>
   <td style="text-align:right;"> 687 </td>
   <td style="text-align:right;"> 1144002 </td>
   <td style="text-align:right;"> 1144689 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Female </td>
   <td style="text-align:right;"> 462 </td>
   <td style="text-align:right;"> 688910 </td>
   <td style="text-align:right;"> 659372 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 1149 </td>
   <td style="text-align:right;"> 1832912 </td>
   <td style="text-align:right;"> 1804061 </td>
  </tr>
</tbody>
</table>
]

.small[[Allergic reactions during coronary angiography or PCI in Poland. Polish Heart Journal (Kardiologia Polska)Vol 83, No 3 (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)]

---
class: middle
**Illustrated Example... Let's reproduce the analysis from Figure 1**

``` r
pcidat1<-c(687, 1144002, 462, 658943)

pciOR1<- epi.2by2(pcidat1, method = "cross.sectional")
pciOR1$tab
```

_"No gender differences"_ **.red[(???)]**

``` r
round(pciOR1$massoc.detail$OR.strata.wald, 2)
```

```
##    est lower upper
## 1 0.86  0.76  0.96
```

---
class: middle
**Illustrated Example... Let's reproduce the analysis**

``` r
pciOR1<- epi.2by2(pcidat1, method = "cross.sectional")
pciOR1
```

```
##              Outcome +    Outcome -      Total               Prev risk *
## Exposed +          687      1144002    1144689       0.06 (0.06 to 0.06)
## Exposed -          462       658943     659405       0.07 (0.06 to 0.08)
## Total             1149      1802945    1804094       0.06 (0.06 to 0.07)
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Prev risk ratio                                0.86 (0.76, 0.96)
## Prev odds ratio                                0.86 (0.76, 0.96)
## Attrib prev in the exposed *                   -0.01 (-0.02, -0.00)
## Attrib fraction in the exposed (%)            -16.74 (-31.35, -3.76)
## Attrib prev in the population *                -0.01 (-0.01, 0.00)
## Attrib fraction in the population (%)         -10.01 (-18.04, -2.52)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 6.635 Pr>chi2 = 0.010
## Fisher exact test that OR = 1: Pr>chi2 = 0.011
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units
```
.red[How do I obtain the] `$\chi^2$`  ?

---
class: middle
**Illustrated Example... What's in the Table 3?**

``` r
pcidat1a<-c(725, 1144689, 450, 659372)
pciOR1a<- epi.2by2(pcidat1a, method = "cross.sectional")
pciOR1a
```

```
##              Outcome +    Outcome -      Total               Prev risk *
## Exposed +          725      1144689    1145414       0.06 (0.06 to 0.07)
## Exposed -          450       659372     659822       0.07 (0.06 to 0.07)
## Total             1175      1804061    1805236       0.07 (0.06 to 0.07)
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Prev risk ratio                                0.93 (0.83, 1.04)
## Prev odds ratio                                0.93 (0.83, 1.04)
## Attrib prev in the exposed *                   -0.00 (-0.01, 0.00)
## Attrib fraction in the exposed (%)            -7.75 (-21.19, 4.20)
## Attrib prev in the population *                -0.00 (-0.01, 0.00)
## Attrib fraction in the population (%)         -4.78 (-12.67, 2.55)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 1.548 Pr>chi2 = 0.213
## Fisher exact test that OR = 1: Pr>chi2 = 0.214
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units
```

---
class: middle
**<span style="color:darkred">What's clear and what's not here?</span>**

.pull-left[
**Results**
<img src="images/L5_PCI_results.png" width="100%" style="display: block; margin: auto;" />
.small[[Allergic reactions during coronary angiography or PCI in Poland. Polish Heart Journal (Kardiologia Polska)Vol 83, No 3 (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)]
]

--
.pull-right[
**Methods**
<img src="images/L5_PCIstats.png" width="100%" style="display: block; margin: auto;" />
]

---
Class: middle
**Illustrated Example**
<img src="images/L5_PCI_polandtable3.png" width="50%" style="display: block; margin: auto;" />

`$^1$` [Allergic reactions during coronary angiography or PCI in Poland. (2025)](https://journals.viamedica.pl/polish_heart_journal/article/view/104049/83023)

---
class: middle

**<span style="color:darkred">What's clear and what's not here?</span>**

### .red[ What would be the actual interpretation of these ORs? ]

---

class: middle
**Illustrated Example**
> "_When assessing the incidence of allergic reactions ..., there was a decrease in the frequency of allergic reactions regardless of acute coronary syndromes (ACS) (Table 1)._"
> "_...the risk of allergic reactions was increased by... a diagnosis of ACS_"

``` r
pcidat2<-c(695, 955482, 482, 857512)
pciOR2<- epi.2by2(pcidat2, method = "cross.sectional")
pciOR2
```

```
##              Outcome +    Outcome -      Total               Prev risk *
## Exposed +          695       955482     956177       0.07 (0.07 to 0.08)
## Exposed -          482       857512     857994       0.06 (0.05 to 0.06)
## Total             1177      1812994    1814171       0.06 (0.06 to 0.07)
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Prev risk ratio                                1.29 (1.15, 1.45)
## Prev odds ratio                                1.29 (1.15, 1.45)
## Attrib prev in the exposed *                   0.02 (0.01, 0.02)
## Attrib fraction in the exposed (%)            22.71 (13.19, 31.19)
## Attrib prev in the population *                0.01 (0.00, 0.01)
## Attrib fraction in the population (%)         13.41 (7.26, 19.15)
## -------------------------------------------------------------------
## Uncorrected chi2 test that OR = 1: chi2(1) = 19.007 Pr>chi2 = <0.001
## Fisher exact test that OR = 1: Pr>chi2 = <0.001
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units
```

---
class: middle
### What did the authors do with the covariates?

--
<img src="images/L5_bbt_howard.gif" width="80%" style="display: block; margin: auto;" />
[The Big Bang Theory Gifs](https://fyeahbigbangtheorygifs.tumblr.com/post/9854726239)

---
class: middle
### When to use which measure of Association

- Research Question
- Public Health Relevance
- Intervention (Design of/ Intervenable exposure?)
- Study design

- Contrasts requires assignment to group, which requires measurement of group membership
  For example, measurement of exposure
  - What will happen if the exposure is measured poorly?
- Since risks are only well-defined within a specific time-period, state that time-period.

---

class: middle
### Things to consider when measuring associations
- Be aware that in real life, we encounter:

- Random Error
 - Systematic error
  - Competing Risks, Confounding, Selection Bias, Measurement Error
 - Methods related limitations
 - Clinical vs Statistical hurdles

---
class: middle

### Contrast (Association vs Impact)
Most contrasts can be used to assess either:

- Association (which is agnostic on the question of causality)

- Impact (which is causal). The risk difference, for example.

Certain measures, however, are implicitly causal and (probably) shouldn’t be used to merely describe an association.

- Among these latter measures are **number needed to treat, and attributable contrasts**.

---

###  Population Attributable Fraction (PAF)

A population attributable fraction (PAF) can be thought of as
- "_The proportion of disease burden among the total population which is caused by the exposure"_.
- That definition is explicitly causal. 
- PAF is implicitly causal. 
- “Attribution” implies cause, though we can argue over usage.
 
Calculated as `$(P(Y) - P(Y^{x=0}))/ P(Y)$` where,

- `$P(Y)$` is the risk of the outcome in the whole population,

- `$P(Y|X=0)$` is the risk of the outcome in the unexposed.

Note that because most outcomes are caused by more than one thing, the sum of PAFs can be (and often are) greater than 100%. .small[[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)]

---

class: middle
**Example:**
- Exposure to TB is a necessary cause of active TB: by definition.
  - So from the above, `$P(Y_{TB} exposure=0) = 0%$`, and so PAF = 100%.
- But not everyone exposed to TB develops active TB. There are other causes. E.g., being immunocompromised. 
  - RD for immunocompromised status > 0; PAF>0.
  
- `$PAF_{TB-exposure} + PAF_{immunocompromised} > 100%$`

This concept can be tied to Rothman’s causal pies model of causality .small[[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)]

---
class: middle
### A measure related to PAFs

The **population attributable risk difference** is the difference between the risk of the outcome in the observed population and the risk of the outcome **if all exposure were removed**, that is:

`$$P(Y) - P(Y^{x=0})$$`

-The potential outcomes notation is _deliberate_ here.

[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---
class: middle
### Remember our anaphylaxis by sex example?

**Assuming a cohort design:**

``` r
pcidat3<-c(687, 1144002, 462, 658943)

pciOR3<- epi.2by2(pcidat3, method = "cohort.count")
kbl(pciOR3$massoc.summary, digits = 2)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> var </th>
   <th style="text-align:right;"> est </th>
   <th style="text-align:right;"> lower </th>
   <th style="text-align:right;"> upper </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Inc risk ratio </td>
   <td style="text-align:right;"> 0.86 </td>
   <td style="text-align:right;"> 0.76 </td>
   <td style="text-align:right;"> 0.96 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Inc odds ratio </td>
   <td style="text-align:right;"> 0.86 </td>
   <td style="text-align:right;"> 0.76 </td>
   <td style="text-align:right;"> 0.96 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Attrib inc risk * </td>
   <td style="text-align:right;"> -0.01 </td>
   <td style="text-align:right;"> -0.02 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Attrib fraction in exposed (%) </td>
   <td style="text-align:right;"> -16.74 </td>
   <td style="text-align:right;"> -31.35 </td>
   <td style="text-align:right;"> -3.76 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Attrib inc risk in population * </td>
   <td style="text-align:right;"> -0.01 </td>
   <td style="text-align:right;"> -0.01 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Attrib fraction in population (%) </td>
   <td style="text-align:right;"> -10.01 </td>
   <td style="text-align:right;"> -18.04 </td>
   <td style="text-align:right;"> -2.52 </td>
  </tr>
</tbody>
</table>

---
class: middle
### Number Needed to Treat

The number needed to treat (NNT) is the number of individuals who we would need to treat in order to **prevent** one bad outcome.

The NNT is calculated as `$|RD|^{-1}$`
 - The inverse of the absolute value of the risk difference.
- For a harmful exposure, we **keep the absolute value**, but describe the measure as a _**number needed to harm**_.

[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---

### NNT

If the 5-year risk of death among treated is 10%, and among untreated is 20%,

- then how many people do you need to treat to prevent one death over five years?
 - `$|10{\%} - 20{\%}|^{-1}$` = `$|-10{\%}|^{-1} = 0.10^{-1} = 10$`.
- We must treat 10 people to prevent one death over five years.

Notation:		`$NNT = 1/|(P(Y|X=1) - P(Y|X=0))|$`

What’s the null value?

[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---
### NNT

Often by convention, round NNT up to nearest integer - Can’t treat half a person
 - Conservative approach; doesn’t always make sense.

- NNT implies causality more strongly than risk difference.
- Risk differences can be viewed as descriptive (although you should be specific and cautious about that)
- Just the difference in observed risks between two groups: not necessarily due to group identification.
- In contrast, NNTs explicitly discuss a treatment having a result: thus are expressing a causal effect.
- Explaining an NNT can become tricky outside of a trial setting because exposure (smoking) isn’t always the same as treatment (cognitive behavioral therapy for smoking cessation).

[Epidemiology by design by Daniel Westreich](https://academic.oup.com/book/32358)

---
class: middle

###  QUESTIONS?

## COMMENTS?

# RECOMMENDATIONS?

---

[5 emergency steps for treating anaphylaxis](https://foodallergycanada.ca/wp-content/uploads/5-emergency-steps-for-treating-anaphylaxis-resource.pdf)

[Know what to do in case of emergency](https://www.oma.org/siteassets/oma/media/public/anaphylaxishandout.pdf)

---

- For example: Define a population where 10% of people have `$r_{1i}=$` 0.60 and

- `$r_{0i}=$` 0.20 and 90% of the people have `$r_{1i}=$` 0.035 and `$r_{0i}=$` 0.006

- Here, the individual ORs = 6.0 for every individual 
- the average of the individual ORs = 6.0

- Also, the ratio of the average odds equals 6.0 as well

But, the incidence odds ratio is equal to 3.9

---

##Incidence Odds Ratio - Calculations
**Rudimentary calculations**
.pull-left[

``` r
#10% pop
 0.6/(1-.6) #= 1.5 #r1
 0.2/(1-.2) #= 0.25 #r0
 
#OR in 10%
1.5/0.25 #= 6
 
#OR 90%
 0.035 / (1-0.035) # = 0.03626943 #r1
 0.006/(1- 0.006) # = 0.006036217 #r0

#OR in 90%
 0.03626943/0.006036217 #= 6.008636
 
# OR average 
 1.5+0.03626943 #= 1.536269 
 1.536269/2 #=   0.7681345
 0.25+0.006036217 #= 0.2560362
 0.2560362/2 #= 0.1280181
 
 0.7681345/0.1280181 #= 6.000202
```
]

``` r
## IOR
(0.6*0.1) + (0.035 *0.9) #= 0.0915
(0.2*0.1) + (0.006*0.9) #= 0.0254
#OR1
0.0915/(1- 0.0915) #=0.1007155
#OR0
0.0254/(1-0.0254) #=0.02606197
#IOR
0.1007155/0.02606197 #3.864462
```
]