Last updated: 2024-09-16
Checks: 7 0
Knit directory: PODFRIDGE/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230302)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 6176bd3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/regression.Rmd
) and HTML
(docs/regression.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 6176bd3 | tinalasisi | 2024-09-16 | Cleaning up repo and adding license. |
html | 6176bd3 | tinalasisi | 2024-09-16 | Cleaning up repo and adding license. |
html | 5d7baf2 | hcvw | 2024-09-06 | Build site. |
html | a492d9c | hcvw | 2024-09-06 | Build site. |
html | a60d243 | hcvw | 2024-09-05 | Build site. |
Rmd | 457d560 | hcvw | 2024-09-05 | wflow_publish(c("analysis/regression.Rmd")) |
html | 0cc6e3a | hcvw | 2024-08-14 | Build site. |
Rmd | bcf9628 | hcvw | 2024-08-14 | wflow_publish(c("analysis/regression.Rmd")) |
html | 45906fc | GitHub | 2024-08-14 | Add files via upload |
Rmd | 9728f48 | GitHub | 2024-08-14 | Update regression.Rmd |
html | e6972d6 | hcvw | 2024-06-25 | Build site. |
html | 1eb6d2c | hcvw | 2024-06-25 | Build site. |
Rmd | 96b197a | hcvw | 2024-06-25 | wflow_publish(c("analysis/regression.Rmd")) |
To estimate the number of people in CODIS by state and race, we need information on (1) the number of people in CODIS in each state, and (2) the racial composition of this number. We have several data sources, each of which provide different information for different states:
To leverage this data to make estimates on the number of people in
CODIS by state, we separate states into three categories:
1. States who have data available in the Murphy & Tong dataset. For
these states, no calculations are needed to estimate the number of
people in CODIS.
2. States who have NDIS and SDIS data available but are not in the
Murphy & Tong dataset.
3. States with only NDIS data available.
The plot below shows the data that is available for each state:
Version | Author | Date |
---|---|---|
6176bd3 | tinalasisi | 2024-09-16 |
For states in categories (2) and (3), we need to generate an estimation of the racial composition of the SDIS profiles. To generate this estimation, we use the Murphy & Tong FOIA data to create a regression model of proportion of Black and White people in the data set with the following independent variables: the U.S. census proportion of each state for each rate, the percent of the state’s prison population that is each race, an indicator variable for Black/White race, and interaction variables for census proportion by race and prison racial population by race. We use the coefficients from the regression model to make predictions for the remaining states that do not have data available on the racial composition of the DNA databases.
\[Proportion_{race} = \beta_0 + \beta_1census_{proportion} + \beta_2prison_{proportion} + \beta_3race + \beta_4race*census_{proportion} + \beta_5race*prison_{proportion}\]
For states in category (3) we need both an estimation of the number of people in SDIS, and an estimation of the number of the racial composition of the datasets. To estimate the racial composition of the database, we use the regression model described above. To generate predictions of the number of people in SDIS for these states, we create an additional regression model with dependent variable the number of people in SDIS and independent variables for the U.S. census proportion of the population that is each race, the proportion of the state’s prison population that is each race, and the number of people in NDIS for that state. We create separate regression models for arrests and offenders to obtain more accurate predictions:
\[N_{arrestee} = \beta_0 + \beta_1census_{black} + \beta_2census_{white} + \beta_3prison_{black} + \beta_4prison_{white} + \beta_5NDIS_{arrestees} \] and
\[N_{offender} = \beta_0 + \beta_1census_{black} + \beta_2census_{white} + \beta_3prison_{black} + \beta_4prison_{white} + \beta_5NDIS_{offenders} \]
The following plot shows the coefficient estimates for the regression model that estimates the racial composition of the CODIS dataset using the Murphy & Tong states.
Version | Author | Date |
---|---|---|
6176bd3 | tinalasisi | 2024-09-16 |
While none of the coefficients were significant in the regression, our model had an \(R^2\) value of 0.93, demonstrating a good fit. The following plot showing the estimated racial composition using the regression model vs. the true values for the states with available data. We also plot a difference plot (also known as a Bland-Altman plot).
The following plots show the results of the SDIS regression for both arrestees (left) and offenders (right):
Version | Author | Date |
---|---|---|
6176bd3 | tinalasisi | 2024-09-16 |
Using the above regression models, we make predictions of the number of Black and White people in CODIS by state. The plot below shows our estimates for each state, colored by the data source used for each state. The number of Black people in the database are indicated with circles and the number of White people is indicated by triangles.
Version | Author | Date |
---|---|---|
6176bd3 | tinalasisi | 2024-09-16 |
We additionally generate a plot showing the difference in the percent
of people in CODIS that are Black versus the U.S. census percent of the
population that is Black in each state.
The table containing the estimates for the number of people of each race in CODIS by state, along with the source of the data is below:
State Black Profiles White Profiles Source
1 Alabama 181118.5149 217600.769 SDIS+regression
2 Alaska 5549.5882 39296.292 Regression only
3 Arizona 58944.9246 257202.483 Regression only
4 Arkansas 4031.2855 8680.932 SDIS+regression
5 California 473373.9990 819407.624 Murphy
6 Colorado 57866.4208 262106.908 SDIS+regression
7 Connecticut 5751.2303 7802.577 SDIS+regression
8 Delaware 4539.1416 5410.328 Regression only
9 Florida 475434.7840 829309.538 Murphy
10 Georgia 206442.0982 181496.282 SDIS+regression
11 Hawaii 1242.7095 11370.778 Regression only
12 Idaho 963.4824 34101.874 SDIS+regression
13 Illinois 253303.7558 305785.832 SDIS+regression
14 Indiana 80005.6400 215399.800 Murphy
15 Iowa 21824.2069 119945.100 Regression only
16 Kansas 44190.1327 210909.471 Regression only
17 Kentucky 41798.7993 217311.612 Regression only
18 Louisiana 432013.8926 338844.041 SDIS+regression
19 Maine 1281.0330 30482.016 Murphy
20 Maryland 78263.2834 46862.137 Regression only
21 Massachusetts 33450.8754 94768.533 Regression only
22 Michigan 230192.9750 376683.192 Regression only
23 Minnesota 37914.2174 115906.979 SDIS+regression
24 Mississippi 77616.9619 62226.769 SDIS+regression
25 Missouri 103504.8306 309407.963 SDIS+regression
26 Montana 641.1110 28312.414 SDIS+regression
27 Nebraska 8699.6015 34027.310 Regression only
28 Nevada 42937.8560 116401.844 Murphy
29 New Hampshire 688.2462 14974.579 Regression only
30 New Jersey 12389.6261 11436.627 SDIS+regression
31 New Mexico 8742.4332 138405.866 Regression only
32 New York 249297.7042 253243.373 Regression only
33 North Carolina 151156.0304 193565.650 SDIS+regression
34 North Dakota 3481.9052 28239.746 Regression only
35 Ohio 313483.4690 653979.072 Regression only
36 Oklahoma 44903.8916 149971.344 Regression only
37 Oregon 13690.7917 202325.147 Regression only
38 Pennsylvania 156674.3833 288601.601 Regression only
39 Rhode Island 5961.4416 8911.275 SDIS+regression
40 South Carolina 91029.7856 91493.358 SDIS+regression
41 South Dakota 4056.0000 45156.800 Murphy
42 Tennessee 156539.3105 321673.841 Regression only
43 Texas 279646.6350 358447.405 Murphy
44 Utah 6663.1740 106665.663 Regression only
45 Vermont 680.5597 12917.961 Regression only
46 Virginia 218711.3900 275687.781 SDIS+regression
47 Washington 31472.5629 235805.829 SDIS+regression
48 West Virginia 4234.0180 47855.252 SDIS+regression
49 Wisconsin 95752.7058 279276.555 Regression only
50 Wyoming 668.1218 18897.896 Regression only
Finally, we generate side-by-side pie charts for each state showing the racial composition according to the census (left) versus the estimated racial composition of CODIS (right) for each state. Note that groups not identifying as Black or White are omitted for easy comparison.
Version | Author | Date |
---|---|---|
6176bd3 | tinalasisi | 2024-09-16 |
[1] Murphy, Erin, and Jun H. Tong. “The racial composition of forensic DNA databases.” Calif. L. Rev. 108 (2020): 1847.
[2] Klein, Brennan, et al. “COVID-19 amplified racial disparities in the US criminal legal system.” Nature 617.7960 (2023): 344-350.
[3] https://le.fbi.gov/science-and-lab/biometrics-and-fingerprints/codis/codis-ndis-statistics
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Detroit
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sf_1.0-17 viridis_0.6.5 viridisLite_0.4.2 cowplot_1.1.3
[5] tidycensus_1.6.5 sandwich_3.1-1 ggpubr_0.6.0 jtools_2.3.0
[9] knitr_1.48 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[13] dplyr_1.1.4 purrr_1.0.2 tidyr_1.3.1 tibble_3.2.1
[17] ggplot2_3.5.1 tidyverse_2.0.0 readr_2.1.5 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2 fastmap_1.2.0
[4] promises_1.3.0 digest_0.6.36 timechange_0.3.0
[7] lifecycle_1.0.4 processx_3.8.4 magrittr_2.0.3
[10] compiler_4.4.1 rlang_1.1.4 sass_0.4.9
[13] tools_4.4.1 utf8_1.2.4 yaml_2.3.9
[16] ggsignif_0.6.4 labeling_0.4.3 curl_5.2.1
[19] classInt_0.4-10 xml2_1.3.6 KernSmooth_2.23-24
[22] abind_1.4-8 withr_3.0.0 grid_4.4.1
[25] fansi_1.0.6 git2r_0.33.0 e1071_1.7-14
[28] colorspace_2.1-0 future_1.33.2 globals_0.16.3
[31] scales_1.3.0 cli_3.6.3 crayon_1.5.3
[34] rmarkdown_2.27 generics_0.1.3 rstudioapi_0.16.0
[37] httr_1.4.7 tzdb_0.4.0 proxy_0.4-27
[40] DBI_1.2.3 cachem_1.1.0 pander_0.6.5
[43] splines_4.4.1 rvest_1.0.4 parallel_4.4.1
[46] tigris_2.1 vctrs_0.6.5 jsonlite_1.8.8
[49] carData_3.0-5 car_3.1-2 callr_3.7.6
[52] hms_1.1.3 rstatix_0.7.2 listenv_0.9.1
[55] jquerylib_0.1.4 units_0.8-5 glue_1.7.0
[58] parallelly_1.37.1 codetools_0.2-20 ps_1.7.7
[61] stringi_1.8.4 gtable_0.3.5 later_1.3.2
[64] broom.mixed_0.2.9.5 munsell_0.5.1 furrr_0.3.1
[67] pillar_1.9.0 rappdirs_0.3.3 htmltools_0.5.8.1
[70] R6_2.5.1 rprojroot_2.0.4 evaluate_0.24.0
[73] lattice_0.22-6 highr_0.11 backports_1.5.0
[76] broom_1.0.6 httpuv_1.6.15 bslib_0.7.0
[79] class_7.3-22 uuid_1.2-0 Rcpp_1.0.12
[82] gridExtra_2.3 nlme_3.1-164 whisker_0.4.1
[85] xfun_0.45 fs_1.6.4 zoo_1.8-12
[88] getPass_0.2-4 pkgconfig_2.0.3