Measurement Error - Core concepts

class: left, middle, inverse, title-slide

.title[
# Measurement Error - Core concepts
]
.author[
### Mabel Carabali
]
.institute[
### EBOH, McGill University
]
.date[
### updated: 2024-11-22
]

---

class: middle
<img src="images/biases.png" width="120%" style="display: block; margin: auto;" />
---

class: middle
**Expected competencies:**
- Knowledge about Information Bias:
 - Understand the concept of misclassification vs measurement error
 - Sources and effects/ direction of the bias

.pull-left[
### Objectives
- To revise core concepts on measurement error.
 - Structure, mechanisms and implications of measurement error in the validity of epidemiological studies.
- To identify analytic tools to identify and address measurement error.
]

--
.pull-right[
### Outline
1. Random Error
2. Not so Random Error
 - (Un)conscious Bias
3. Systematic Error
]

---
class:middle
###What is random error? 
Does it depend on sampling?
- Rothman (Modern Epidemiology):  _"all studies have random error, even if there is no sampling"_. 
- [Greenland (1990)](https://journals.lww.com/epidem/Abstract/1990/11000/Randomization,_Statistics,_and_Causal_Inference.3.aspx): _"all studies have random elements, but our models for handling random error are based on sampling, and so these models are not appropriate when there is no sampling."_

**Precision is the inverse of variance**, so adding sample size is one common way to increase precision without sacrificing validity.

.small[[Savitz, David A., and Gregory A. Wellenius, 'Random Error', Interpreting Epidemiologic Evidence: Connecting Research to Applications, 2nd edn (New York, 2016; online edn, Oxford Academic, 17 Nov. 2016)]((https://doi.org/10.1093/acprof:oso/9780190243777.003.0012))]

---
class: middle
<img src="images/var_prec.png" width="30%" style="display: block; margin: auto;" />

---
class: middle
###Trade-offs between bias and precision:
A measure of average "closeness" of an estimator to the parameter being estimated is the **Mean Square Error (MSE)** of the estimator.

- For `$\theta^*$` as an estimator of `$\theta$` (the real value), the MSE of `$\theta^*$` is defined as:

`$$MSE (\theta^*) = E[\theta^* - \theta]^2$$`

--
- This is just the mean of squared deviations between estimate `$\theta^*$` and the true value `$\theta$`.
- How far we expect our `$\theta^*$` to deviate from `$\theta$` will depend on both random sampling variability and systematic bias.

Efron & Morris. Scientific American 1977 and Modified from Jay Kaufman EPIB-704-2021.

---
class: middle
###Trade-offs between bias and precision:
The variance of an estimator `$\theta^*$` is defined as: `$VAR(\theta^*) = E[ \theta^* - E[\theta^*]]^2$`
- The MSE is the average squared deviation of the estimator from the parameter `$\theta$` being estimated, 
- The variance is the average squared deviation of the estimator `$\theta^*$` from its expectation `$E[\theta^*]$`.
`$$BIAS(\theta^*) = (E[\theta^*] − \theta)$$`

--
So, if the estimator `$\theta^*$` is unbiased, `$E[\theta^*] = \theta$` and `$MSE(\theta^*) = VAR(\theta^*)$`.

--
- If the estimator is biased, then:

`$$MSE(\theta^*) = VAR(\theta^*) + BIAS(\theta^*)^2$$`

To get the smallest possible value of `$MSE(\theta^*)$`, one must consider **bias and precision** .purple[(random sampling variability)].

---
class: middle
###Trade-offs between bias and precision:
It is possible for the variance of a biased estimator to be sufficiently smaller than the variance of an unbiased estimator to more than compensate for the bias introduced. 
- In this case, the biased estimator is closer, on average, to the parameter being estimated than is the unbiased estimator.

Modified from Jay Kaufman EPIB-704-2021.

---
class:middle
###Approaches to random error:
Mostly, sample size... BUT, in some cases, attaining the **ideal sample size** is a concern.

So, aiming at efficiency we use Designs (e.g., Case-control) and  analytic techniques: (e.g., Matching, also used for systematic error as design, as discussed before).

Or, deciding whether the study result can be attributed to random error or not. **NHST!!!**
.red[
> - _"Since “chance alone” is never what is at play, the significance testing approach is useless."_
]

---
class: middle
### Random vs Systematic Error
<img src="images/randome_systematic.png" width="80%" height="150%" style="display: block; margin: auto;" />

---
class:middle
###What is random error?

- Inference is about the parameters of the super-population, not the sample. 
  - So, if there is no sampling, can we consider useful interpretations of these statistics?
- Another important problem about NHST is that “significant” results are, in expectation, overestimates.

For example : [Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008 Sep;19(5):640-8](https://journals.lww.com/epidem/fulltext/2008/09000/why_most_discovered_true_associations_are_inflated.2.aspx)

> "_Newly discovered true (non-null) associations often have inflated effects compared with the true effect sizes..."_

---
class:middle
###Why Most Discovered True Associations Are Inflated?

> _"First, theoretical considerations prove that when true discovery is **claimed based on crossing a threshold of statistical significance** and the discovery study is **underpowered**, the observed effects are expected to be inflated._

> _"Second, flexible analyses coupled with **selective reporting** may inflate the published discovered effects._

> _"Third, effects may be **inflated at the stage of interpretation** due to diverse conflicts of interest._

> _"Discovered effects are not always inflated, and **under some circumstances may be deflated** for example, in the setting of late discovery of associations in **sequentially accumulated overpowered evidence**, in some types of misclassification from **measurement error**, and in **conflicts causing reverse biases**."_

[Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008 Sep;19(5):640-8](https://journals.lww.com/epidem/Fulltext/2008/09000/Why_Most_Discovered_True_Associations_Are_Inflated.2.aspx)

---
class:middle
### What is Information Bias/Measurement error?
**Systematic bias** - it is not random error!
- But, it could occur at random (i.e., a systematic error (measurement error) that occurs to everyone in a random fashion).

Information bias is caused by measurement error in the exposure, outcome or covariates.

---
class:middle
### What is Information Bias/Measurement error?
Two widely known forms:
1. Classical additive form: .blue[Observed] = .green[Truth] + .red[Error]
 
--

2. Berkson error: .green[Truth] = .blue[Observed] + .red[Error]
 
--
 
- Errors: zero mean and constant variance
- Also known as **misclassification** (for categorical variables), involves sensitivity and specificity.

---
class:middle
###What is Information Bias/Measurement error?

Information bias generally can result from a number of processes, including systematic bias in collection of information or faulty instrumentation.

--
- It is an important form and source of systematic error, especially because even if occurs completely at random, **it can introduce bias** into effect estimates. 
 - e.g., if random _noise_ makes two distinct groups look more similar, then comparisons between those groups may be attenuated.

**This bias is not necessarily improved by increases in sample size! **

---
class:middle
###What is Information Bias/Measurement error?
Measurement Error and Misclassification can be:

+ .purple[Non-differential (independent errors)].
 - Systematic Error, occurring in a random fashion.
 
+ .purple[Differential (depends on another variable)]. 
 - Systematic Error, occurring NOT in a random fashion.

---

class:middle
### Measurement error

.small[[Reflection on modern methods: five myths about measurement error in epidemiological research. Int J Epidemiol, Volume 49, Issue 1, February 2020, Pages 338–347](https://doi.org/10.1093/ije/dyz251)]

---

class:middle
### Measurement error

<img src="images/whammy.jpg" width="50%" style="display: block; margin: auto;" />
.blue[
Dashed is regression line without measurement error (Truth): 
- Truth: outcome = exposure + e, e∼N(0, 0.6), exposure∼N(0, 1) 
]
--

.red[
Solid line is regression line with measurement error (With ME).
- Truth + er, er∼N(0, 0.5)
]

.small[[Reflection on modern methods: five myths about measurement error in epidemiological research. Int J Epidemiol, 49-1; 2020. 338–347](https://doi.org/10.1093/ije/dyz251)]

---
class:middle
### Measurement error

.pull-left[
<img src="images/me1.png" width="150%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="images/me2.png" width="150%" style="display: block; margin: auto;" />
]

---
class:middle
### Measurement error

.pull-left[
<img src="images/me3.png" width="150%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="images/me4.png" width="150%" style="display: block; margin: auto;" />
]

---
class:middle
### Measurement error

.pull-left[
<img src="images/me6.png" width="150%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="images/me5.png" width="150%" style="display: block; margin: auto;" />
]

---

class:middle
### Measurement error

.small[[Reflection on modern methods: five myths about measurement error in epidemiological research. Int J Epidemiol. 49:1 (2020). Pag 338–347](https://doi.org/10.1093/ije/dyz251)]

---
class:middle
###What is Information Bias/Measurement error?
Measurement Error and Misclassification can be:

+ .purple[Non-differential (independent errors (in the estimation))].
 - Systematic Error, occurring in a random fashion.
 
+ .purple[Differential (depends on another variable, the errors are not independent)]. 
 - Systematic Error, occurring NOT in a random fashion.

---
class:middle
###Non-differential vs Differential misclassification

>Differential is scarier than non-differential misclassification (NDM).

>- With NDM, in 2 categories, we count on the idea that bias _will tend_ to be toward the null.
>- With DM, all bets are off. Non-differential misclassification of an exposure with more than 2 levels, the bias can go in either direction.

>NDM wrinkles: “tend”
>_“With NDM, in 2 categories, we count on the idea that bias will tend to be toward the null.”_

> True. But “tend” is “on average” which is NOT ALWAYS.

[Epidemiology by Design](https://academic.oup.com/book/32358/chapter/268624216)

---
class: middle
### Causality and Measuremetn Error
[Hernán & Robins Chapter 9](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)

9.1: Sets up a model in which `$A$` is the exposure, `$A^*$` is the measured exposure, and the causes of `$A^*$` are `$A$` and `$U_a$`, which are all the other reasons that `$A^*$` takes its specific value other than `$A$` (and which can include random or obscure inputs).

9.2: Classification of the structure of measurement error according to two properties: independence and non-differentiality. 
- Independence is about whether the error terms are correlated or not.

.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-14-1.png" width="80%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" />
]

---
class:middle
###Hernán & Robins Chapter 9:
Non-differentiality is about whether **the exposure measurement error differs by outcome status**, or whether the outcome measurement error differs by exposure status.

.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-17-1.png" width="80%" style="display: block; margin: auto;" />
]

---
class:middle
###Hernán & Robins Chapter 9:
9.3. **Effect of Mismeasured confounders**

.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-18-1.png" width="80%" style="display: block; margin: auto;" />

``` r
#adjustmentSets(me5, type= "canonical") #<< Gives NULL results
paths(me5, "A", "Y")
```

```
## $paths
## [1] "A -> Y" "A <- L -> Y"
## 
## $open
## [1] TRUE TRUE
```

]
--
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-20-1.png" width="80%" style="display: block; margin: auto;" />

``` r
adjustmentSets(me6, type= "canonical") 
```

```
## { U }
```

``` r
paths(me6, "A", "Y")
```

```
## $paths
## [1] "A -> Y" "A <- L <- U -> Y"
## 
## $open
## [1] TRUE TRUE
```
]

---
class:middle
###Hernán & Robins Chapter 9:
9.3. **Effect of Mismeasured confounders**
The worse measure we have of a covariate `$C^*$`, the less of a **collider** actually is,
and so it doesn’t create as much Selection Bias as expected from adjusting for a well measured `$C$`

.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-22-1.png" width="80%" style="display: block; margin: auto;" />

``` r
adjustmentSets(me7, type= "canonical") 
```

```
##  {}
```

]
--
.pull-right[

``` r
 ggdag_adjust(me7, var = c( "C_x"))+
  labs(title = " Bias")+
  theme_dag_blank()
```

<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-24-1.png" width="80%" style="display: block; margin: auto;" />
]

---
class:middle
### Would a Mismeasured covariate create Bias?

--
**.red[YES... well some times]**

--
### How much Bias?
... It depends!

- The .blue[magnitude of the Measurement Error] (i.e., How close of far is the error measured covariate from the true covariate).
- The .blue[strength of the association] between the correctly measured covariate and the exposure and/or outcome.
- Whether the misclassification of the covariate is .blue[differential by levels] of exposure or outcome.
- Whether the mismeasured covariate results in .blue[effect measure modification].
- Depending on .blue[the structure of the error] and whether information on the mismeasured covariate is .blue[sufficient to block open back-door paths].

---
class:middle
###Non-differential vs Differential misclassification

|      | Truly Exposed              | Truly Unexposed         |
|------|:------------------:|:--------------------:| 
|Classified as Exposed |A   |B  |
|Classified as Unexposed |C  |D |

.pull-left[
- **Sensitivity (SE)**

= `$\frac{A}{A+C}$`

- **Specificity (SP)**

= `$\frac{D}{B+D}$`

]

.pull-right[
- **False negative proportion (FN)**

= `$\frac{C}{A+C}$`

- **False positive proportion (FP)**

= `$\frac{B}{B+D}$`

]

---
class:middle
###Non-differential vs Differential misclassification
| | Truly Exposed | Truly Unexposed |
|------|:------------------:|:--------------------:| 
|Classified as Exposed |A |B |
|Classified as Unexposed |C |D |

.pull-left[
- **Positive predictive value (PPV)**

= `$\frac{A} {A+B}$`

- **Negative predictive value (NPV)**

= `$\frac{D}{C+D}$`
]

.pull-right[
- **PPV**

= `$\frac{P(E1)} {P(E1) + P(E0)(SE / (1-SP))}$`

- **NPV**

= `$\frac{P(E0)} {P(E0) + P(E1)(1 - SE/SP)}$`
]

---
class:middle
###Non-differential vs Differential misclassification

“Non-differential Misclassification of the Exposure” means that the sensitivity and specificity .purple[(for exposure ascertainment)] are the same in the cases as they are in the controls.

In this context:

- **Sensitivity** = Pr(classified as exposed | truly exposed) 
- **Specificity** = Pr(classified as unexposed | truly unexposed)

--

If you have non-differential misclassification of a binary exposure, the bias _on average_ will be toward the null. 
 - This does not guarantee that the error will be toward the null.

---
class:middle
### Misclassification correction by hand (ME3 Chapter 19):

Say you have a 2 x 2 table with these labels:

|      | Cases              | Controls             |
|------|--------------------|----------------------| 
|E (+) |A = exposed cases   |B = exposed controls |
|E (-) |C = unexposed cases |D = unexposed controls|

Let's call the **true** cell counts A’, B’, C’ and D’

- Then your **observed** `$A$` cell (after the misclassification has taken place) contains two kinds of units:
 - True positives (TP) and false positives (FP), where “positive” refers to classification as “exposed”.
- Therefore the **observed** `$A$` cell count will contain:

`$(sensitivity \times A’) + ((1-specificity) \times C’)$`

---
class:middle
### Misclassification  correction by hand (ME3 Chapter 19): 
Likewise, the **observed** `$C$` cell (after misclassification has taken place) contains two kinds of units:
false negatives (`FN`) and true negatives (`TN`), where “negative” is classification as “non-exposed”.

Therefore the observed `$C$` cell count will contain: 
`$((1-sensitivity) \times A’) + ((specificity) \times C’)$`
 
--
 
.purple[Similar calculations are then applied to the controls and ]

Therefore the observed `B` cell count will contain: 
`$(sensitivity \times B’) + ((1-specificity) \times D’)$`

Therefore the observed `D` cell count will contain:
`$((1-sensitivity) \times B’) + ((specificity) \times D’)$`

---
class:middle
### Misclassification

---
class:middle
####An example using numbers from Szklo & Nieto Exhibit 4-3 on page 124 (2nd Edition):
Original table was:

|      | Cases              | Controls             |
|------|--------------------|----------------------| 
|E (+) |A' = exposed cases (80)  |B' = exposed controls (50) |
|E (-) |C' = unexposed cases (20) |D' = unexposed controls (50)|

.green[So true OR = (80x50) / (20x50) = 4.0]

- Sensitivity = 90%; Specificity = 80%

---
class:middle
###The observed table is therefore:

`$OR = (76 \times 45) / (24 \times 55) = 2.6$` ; .blue[So observed OR= 2.6]

In terms of magnitude of bias: `$|ln(CoOR)| = |ln(\frac{OR_{True}}{OR_{Obs}})|$`

.purple[this is a change of |ln(CoOR)| = |ln(4/2.6)| = |ln(1.538)| = 43%]

---
class:middle
### Simple correction for misclassification
Using the [`episensr` package](https://github.com/dhaine/episensr)

``` r
library(episensr)
mc<- misclassification(matrix(c(76, 55, 24, 45),
 dimnames = list( c("Exposed", "Unexposed"), c("Cases", "Controls")),
 nrow = 2, byrow = TRUE),
 type = "exposure", bias_parms = c(.9, .9, .8, .8))
```
--

```
## --Observed data-- 
##          Outcome: Exposed 
##        Comparing: Cases vs. Controls 
## 
##           Cases Controls
## Exposed      76       55
## Unexposed    24       45
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 1.381818 1.121523 1.702526
##    Observed Odds Ratio: 2.590909 1.415073 4.743791
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk:  1.584726                    
##    Misclassification Bias Corrected Odds Ratio:  4.439562  1.508274 13.067726
```

---
class:middle
####Correction for misclassification `episensr` package [Tutorial](https://rdrr.io/cran/episensr/f/vignettes/episensr.Rmd)

[Sensitivity Analysis of Misclassification: A Graphical and a Bayesian Approach](https://doi.org/10.1016/j.annepidem.2006.04.001)

``` r
me.smk <-misclassification(matrix(c(126, 92, 71, 224),
 dimnames = list(c("Case", "Control"), c("Smoking +", "Smoking - ")),
 nrow = 2, byrow = TRUE), type = "exposure",
 bias_parms = c(0.94, 0.94, 0.97, 0.97))
```
--
.pull-left[

```
## --Observed data-- 
## Outcome: Case 
## Comparing: Smoking + vs. Smoking - 
## 
## Smoking + Smoking - 
## Case 126 92
## Control 71 224
## 
## 2.5% 97.5%
## Observed Relative Risk: 2.196866 1.796016 2.687181
## Observed Odds Ratio: 4.320882 2.958402 6.310846
## ---
## 2.5% 97.5%
## Misclassification Bias Corrected Relative Risk: 2.377254 
## Misclassification Bias Corrected Odds Ratio: 5.024508 3.282534 7.690912
```
]
--
.pull.right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-30-1.png" width="50%" style="display: block; margin: auto;" />
]

---
class:middle
**.blue[But how do you go from OBSERVED table back to TRUE table by hand?]**
The answer is that the same formulas work in reverse.

**Cases** 
- Observed `A` cell=
`$(Se|Cases \times A) + ((1-Spec|Cases)\times C)$` 
- Observed `C` cell = 
`$((1-Se|Cases) \times A) + (Spec|Cases \times C)$`

--
 
 
 
**Controls**
- Observed `B` cell = 
`$(Se|Controls \times B) + ((1-Spec|Controls) \times D)$` 
- Observed `D` cell = 
`$((1-Se|Controls) \times B) + (Spec|Controls \times D)$`

- One simple way to obtain a confidence interval in any calculation would be to [bootstrap the variance](https://cran.r-project.org/web/packages/episensr/vignettes/episensr.html)

---
class:middle
###Measurement Error: Key messages and some illustrations
- Could affect **ANY** epidemiological research, independent of the sample size.

- The direction of the NDM bias in .red[NOT ALWAYS towards the null (it depends!)]

- The error and .purple[the structure (differential vs non-differential) could be carried over] (e.g., from continuous to categorical variables).

- The error could affect the .blue[effect estimate, the precision and features the data].

.small[
+ Brooks DR, et al. The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research. Current Epidemiology Reports. 2018;5(2):166-74.
+ van Smeden M, et al. Reflection on modern methods: five myths about measurement error in epidemiological research. International journal of epidemiology. 2020;49(1):338-47.
]

---
class: middle
### Simulation Extrapolation (SIMEX)
.red[If you know Sensitivity and Specificity parameters for the misclassification]:

Misclassification Simulation Extrapolation [MC-SIMEX](https://rdrr.io/cran/simex/man/mcsimex.html) is method to update the effect estimate, using a regression-based correction:
--

---
class: middle
### Misclassification Simulation Extrapolation (MC-SIMEX)
<img src="images/mcsimex0.png" width="100%" style="display: block; margin: auto;" />

---
class: middle
### Misclassification Simulation Extrapolation (MC-SIMEX)
<img src="images/mcsimex1.png" width="100%" style="display: block; margin: auto;" />

---
class: middle
### Misclassification Simulation Extrapolation (MC-SIMEX)
<img src="images/mcsimex2.png" width="100%" style="display: block; margin: auto;" />

.purple[*Indicates that the direction of the bias will depend on the structure of the mismeasured parameters.]

---
class: middle
### [MC-SIMEX (Simulation Extrapolation)](https://rdrr.io/cran/simex/man/mcsimex.html)

``` r
require(simex); set.seed(704); n1=1000
E <- rbinom(n1,1,0.5) #parameters for E
Y <- rbinom(n1,1, 0.43); Y <-factor(Y, labels=c("Outcome-", "Outcome+"))
me.dat<- data.frame(E=E, Y=Y)
attach(me.dat)
#misclassification Matrix for the outcome
*ydx <- matrix(data = c(0.85, 0.15, 0.15, 0.85), nrow = 2, byrow = FALSE)
dimnames(ydx) <- list(levels(Y), levels(Y))
*me.dat$dxsimexstar1 <- misclass(data.frame(Y), list(Y = ydx), k = 1)[, 1]
```
--
.pull-left[

```
## 
## Outcome- Outcome+ 
##      536      464
```

```
##          Outcome- Outcome+
## Outcome-     0.85     0.15
## Outcome+     0.15     0.85
```

```
## 
## Outcome- Outcome+ 
##      547      453
```
]
--
.pull-right[

``` r
#fitting the models
naive.modR0<- glm(Y ~ E, 
 binomial(link = "log")) #summary(naive.modR0)
```

```
##             exp(Est.) 2.5% 97.5% z val.    p
## (Intercept)      0.46 0.42  0.50 -16.26 0.00
## E                1.02 0.89  1.17   0.32 0.75
```

]

---
class: middle
###Non- Differential Misclassification of the Outcome

``` r
*mod.simexR0<-mcsimex(naive.modR0, mc.matrix = ydx, SIMEXvariable = "Y", asymptotic = F)

#summary(mod.simexR0)
cbind(naive= round(naive.modR0$coefficients, 2),
      corrected=round(mod.simexR0$coefficients, 2),
      naiveRR=round(exp(naive.modR0$coefficients), 2),
      correctedRR=round(exp(mod.simexR0$coefficients), 2))
```

```
##             naive corrected naiveRR correctedRR
## (Intercept) -0.78     -0.81    0.46        0.45
## E            0.02      0.01    1.02        1.01
```
--

``` r
#plot(mod.simexR0)
modcorrectedR0<-data.frame(mod.simexR0[["SIMEX.estimates"]])
```

---
class: middle
###Non- Differential Misclassification of the Outcome
.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-41-1.png" width="120%" style="display: block; margin: auto;" />
]
--
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-42-1.png" width="120%" style="display: block; margin: auto;" />
]

---
class: middle
###Differential Misclassification of the Outcome (1)

```
## 
## Outcome- Outcome+ 
##      536      464
```

```
##          Outcome- Outcome+
## Outcome-     0.65     0.25
## Outcome+     0.35     0.45
```

```
## Outcome- Outcome+ 
##      490      510
```

```
## naive corrected naiveRR correctedRR
## (Intercept) -0.78 -0.92 0.46 0.40
## E 0.02 0.07 1.02 1.07
```
--
.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-44-1.png" width="80%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-45-1.png" width="80%" style="display: block; margin: auto;" />
]

---
class: middle
###Differential Misclassification of the Outcome (2)

```
## 
## Outcome- Outcome+ 
##      536      464
```

```
##          Outcome- Outcome+
## Outcome-      0.6     0.30
## Outcome+      0.4     0.65
```

```
## Outcome- Outcome+ 
##      461      539
```

```
## naive corrected naiveRR correctedRR
## (Intercept) -0.78 -1.02 0.46 0.36
## E 0.02 0.04 1.02 1.04
```
--
.pull-left[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-47-1.png" width="80%" style="display: block; margin: auto;" />
]
.pull-right[
<img src="L20_EPIB704_Measurement_Error_st_files/figure-html/unnamed-chunk-48-1.png" width="80%" style="display: block; margin: auto;" />
]

---
class: middle
###Simulation Extrapolation (SIMEX); 
Misclassification Simulation Extrapolation (MC-SIMEX)
[MC-SIMEX (Simulation Extrapolation)](https://rdrr.io/cran/simex/man/mcsimex.html)

.small[
Cook, J.R. and Stefanski, L.A. (1994) Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89, 1314 – 1328

Küchenhoff, H., Lederer, W. and E. Lesaffre. (2006) Asymptotic Variance Estimation for the Misclassification SIMEX. Computational Statistics and Data Analysis, 51, 6197 – 6211

Lederer, W. and Küchenhoff, H. (2006) A short introduction to the SIMEX and MCSIMEX. R News, 6(4), 26–31

Carroll, R.J., Küchenhoff, H., Lombard, F. and Stefanski L.A. (1996) Asymptotics for the SIMEX estimator in nonlinear measurement error models. Journal of the American Statistical Association, 91, 242 – 250
]

---
class: middle
### Additional Resources

[Results on Differential and Dependent Measurement Error of the Exposure and the Outcome Using Signed Directed Acyclic Graphs](https://doi.org/10.1093/aje/kwr458)

[Common structures of Bias](https://cran.r-project.org/web/packages/ggdag/vignettes/bias-structures.html)

[Bias Analysis Gone Bad](https://doi.org/10.1093/aje/kwab072)

[Hierarchical semi-Bayes methods for misclassification in perinatal epidemiology](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792373/)

[Adaptive Validation Design: A Bayesian Approach to Validation Substudy Design With Prospective Data Collection](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7269021/)

---
class: middle

###  QUESTIONS?

## COMMENTS?

# RECOMMENDATIONS?

---
class: middle
###Appendices
I. Miscelaneous

II. Theoretical Information fro Causal Interaction from H&R What if?

III. Extra examples

---
class: middle
### Is bias really a bad thing?
Many biased regression techniques have been introduced into statistical and epidemiologic practice, and are helpful for dealing with problems of "multicollinearity" and for integrating prior knowledge into estimation.
- E.g., matching in case-controls, James-Stein shrinkage, empirical Bayes estimation, ridge regression, principal components regression, and hierarchical (i.e. multilevel) modeling.

So, sometimes, a biased estimate is not such a bad thing. [`Kaufman JS. Why are we biased against bias? IJE 2008;37(3):624-6.`](https://academic.oup.com/ije/article/37/3/624/746437)

- [Correction of Selection Bias in Survey Data: Is the Statistical Cure Worse Than the Bias by Hanley JA. Am J Public Health. 2017;107(4):503-505](https://ajph.aphapublications.org/doi/10.2105/AJPH.2016.303644?url_ver=Z39.88-2003&rfr_id=ori%3Arid%3Acrossref.org&rfr_dat=cr_pub++0pubmed)

**BUT** Keep in mind that all of the expressions shown about random error are expectations, so they deal with averages over many iterations. The expected value is therefore less relevant for an epidemiologist who must infer from the results of only one study.

.pull-right[**Judge it yourself :)**]

---
class:middle
### Stuff we know, and are aware of: Exposure assessment 
Unlike prospective cohort studies, case-control studies use retrospective exposure assessment
- Disease status is ascertained at the present
- Exposure status is ascertained for the past
- Exposure assessment procedures must be comparable (and preferably identical) for cases and controls

| Modes of Data Collection        |        Cautions              |
|:-------------------------------:|:----------------------------:|
| Interviews |- Interviewer-respondent relationship - Incorrect recall (aids for remembering, such as visual aids, may help)|
|Medical/Other Records | - Information can be incomplete or less detailed  - Exposures may be recorded non-systematically|
|Physical/Biological Measures | - Assays/measures may be affected by disease status or time |

[Epidemiology by Design](https://academic.oup.com/book/32358/chapter/268624216)

---

class:middle
###Stuff we know: non-differential misclassification of the exposure
_Misascertainment_ of the exposure in a random fashion.
- The measurement error is “non-differential” because the probability of being misclassified is independent of the true outcome status (at random). 
-  In the case of two categories will tend (on average) to move the effect estimate toward the null value of the effect in question (i.e., toward 0 if RD, or 1 if RR). 
- The **bias toward the null** (e.g., a RD closer to zero) is because moving individuals randomly from exposed to unexposed (or vice versa, or both at once) will tend (**_on average_**) to make the two groups more similar.
- In extreme cases, expose/unexposed assignment is at random, irrespective of true exposure, the two groups will have the same proportion and we will **wrongly** detect no differences between them. 
- Unsurprisingly, as two groups become more similar due to misclassification, the differences between them tend to be smaller.

[Epidemiology by Design](https://academic.oup.com/book/32358/chapter/268624216)

---
class:middle
###Stuff we know, and are aware of: Information bias
The case of recall bias in case-control studies:
- Since assessment of exposure comes after case status is known to the case, we can have recall bias.
- Cases may over-recall their exposures (e.g. to chemicals or other toxic agents), relative to the same person if they were non-case.
 - Example: well known birth defects cases and pregnancy exposures.
 - Not an issue if exposure is based on analysis of pre-collected data or biosamples.
 - Related: do the cases perceive a relationship between disease & exposure? 
- Without blinding as to outcome status, investigators may classify exposures differently.
- Investigators may probe cases more deeply than controls.

**All of these may lead to differential misclassification.**

** _Misascertainment_ of the exposure NOT at random.**

---
class:middle
###Tendencies are stochastic.

[Sorahan T, Gilthorpe MS. Non-differential misclassification of exposure always leads to an underestimate of risk: an incorrect conclusion. Occup Environ Med 1994;51:839-840.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1128126/)

Those authors simulated 9 scenarios with non-differential misclassification of the exposure.
 - 5000 simulation runs per scenario.
- Depending on scenario, between 3% and 34% of the runs showed a risk ratio FURTHER from the null due to (theoretically) non-differential misclassification.
- This is because “non-differential” is in expectation. 
- In reality, theoretically non-differential misclassification can manifest as differential, and push effect estimates further from the null.

**So be careful even with this assumption. It’s not a sure thing!**

[Epidemiology by Design](https://academic.oup.com/book/32358/chapter/268624216)

---
class: middle
###II- Theoretical information.

**1 - Note on:H&R What if?**

9.4 Adherence to treatment in randomized experiments.

Figure 9.12 is the same as Figure 9.11, except with the addition of the “exclusion restriction” (Technical Point 9.2).

9.5 The intention-to-treat effect and the per-protocol effect

The per-protocol effect is the causal effect of treatment if all individuals had adhered to their assigned treatment as indicated in the protocol of the randomized experiment. Therefore it is something estimated under assumptions, not directly observable in data.
The causal effect of randomized assignment is the intention-to- treat effect, ITT.

---
class: middle
###Theoretical information.

**2 - Notes from: H&R What if?**

- Fine Point 9.2: ITT can only be calculated when there is no loss to follow-up. 
If some participants are censored, one must use IPCW to adjust for selection bias.
ITT generally considered conservative, and unbiased under the null. The argument that it is conservative is based on an assumption of monotonicity.

- Fine Point 9.4: “Efficacy” versus “Effectiveness”
Obviously Hernán and Robins are not especially fond of the ITT.

- Extra: [Measurement Error in Causal Inference](https://www.taylorfrancis.com/chapters/edit/10.1201/9781315101279-21/measurement-error-causal-inference-linda-valeri)

---
class: middle
### III. Some extra illustrations

``` r
library(episensr)
misclassification(matrix(c(215, 1449, 668, 4296),
                         dimnames = list(c("Breast cancer+", "Breast cancer-"),
                                         c("Smoker+", "Smoker-")),
                         nrow = 2, byrow = TRUE),
                  type = "exposure",
                  bias_parms = c(.78, .78, .99, .99))
```

```
## --Observed data-- 
##          Outcome: Breast cancer+ 
##        Comparing: Smoker+ vs. Smoker- 
## 
##                Smoker+ Smoker-
## Breast cancer+     215    1449
## Breast cancer-     668    4296
## 
##                                        2.5%     97.5%
## Observed Relative Risk: 0.9653825 0.8523766 1.0933704
##    Observed Odds Ratio: 0.9542406 0.8092461 1.1252141
## ---
##                                                                2.5%     97.5%
## Misclassification Bias Corrected Relative Risk: 0.9614392                    
##    Misclassification Bias Corrected Odds Ratio: 0.9490695 0.7895687 1.1407909
```

---
class: middle

``` r
misclassification(matrix(c(4558, 3428, 46305, 46085),
dimnames = list(c("AMI death+", "AMI death-"),
c("Male+", "Male-")),
nrow = 2, byrow = TRUE),
type = "outcome",
bias_parms = c(.53, .53, .99, .99))
```

```
## --Observed data-- 
##          Outcome: AMI death+ 
##        Comparing: Male+ vs. Male- 
## 
##            Male+ Male-
## AMI death+  4558  3428
## AMI death- 46305 46085
## 
##                                      2.5%    97.5%
## Observed Relative Risk: 1.294347 1.240431 1.350607
##    Observed Odds Ratio: 1.323321 1.263639 1.385822
## ---
##                                                         
## Misclassification Bias Corrected Relative Risk: 1.344039
##    Misclassification Bias Corrected Odds Ratio: 1.406235
```

---
class: middle
## Options for design of an internal validation study

|Design strategy |Bias parameters |
|----------------|----------------|
|Select individuals based on misclassified exposure measurement| PPV, NPV |
|Select individuals based on true exposure status  |SE, SP  |
|Select a random sample of individuals  | SE, SP, PPV, NPV |

[Applying Quantitative Bias Analysis to Epidemiologic Data](https://drive.google.com/file/d/1Y_A5mynST_bOUjiA7sYAIOEw8B8RzsL-/view)

---
class: middle

##Joint Outcome and Covariates Misclassification???

[The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research](https://link.springer.com/article/10.1007/s40471-018-0147-y)

Brooks, D.R., Getz, K.D., Brennan, A.T. et al. The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research. Curr Epidemiol Rep 5, 166–174 (2018). https://doi.org/10.1007/s40471-018-0147-y