Last updated: 2021-03-03

Checks: 2 0

Knit directory: fa_sim_cal/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4260c26. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .tresorit/
    Ignored:    data/VR_20051125.txt.xz
    Ignored:    output/blk_char.fst
    Ignored:    output/ent_blk.fst
    Ignored:    output/ent_cln.fst
    Ignored:    output/ent_raw.fst
    Ignored:    renv/library/
    Ignored:    renv/staging/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 4260c26 Ross Gayler 2021-03-03 Add section header for overview documents in index
html 67e6fdf Ross Gayler 2021-03-03 Build site.
Rmd 5b5369f Ross Gayler 2021-03-03 Add workflow management notes
Rmd 55ee0b1 Ross Gayler 2021-02-27 end of day
html 0d30c5b Ross Gayler 2021-01-26 Build site.
html 23d740f Ross Gayler 2021-01-24 Build site.
Rmd 6df8db7 Ross Gayler 2021-01-24 End of day
html 0c48ca5 Ross Gayler 2021-01-17 Build site.
Rmd 3052780 Ross Gayler 2021-01-17 Add 02-1 block vars
html 0405e0b Ross Gayler 2021-01-15 Build site.
Rmd 00a9ff4 Ross Gayler 2021-01-15 wflow_publish(c(“analysis/index.Rmd”))
html 5ab5dc4 Ross Gayler 2021-01-15 Build site.
Rmd c674a51 Ross Gayler 2021-01-15 Add 01-6 clean vars
html c674a51 Ross Gayler 2021-01-15 Add 01-6 clean vars
Rmd 874917f Ross Gayler 2021-01-13 Revise conclusion re inserted 5 in name
html c0c5313 Ross Gayler 2021-01-13 Build site.
html 22f0b81 Ross Gayler 2021-01-13 Build site.
Rmd d3deb84 Ross Gayler 2021-01-13 Add 01-5 check name
html 44538d8 Ross Gayler 2021-01-12 Build site.
html 8cd5fa1 Ross Gayler 2021-01-12 Build site.
html abb201f Ross Gayler 2021-01-12 Build site.
Rmd 2ae8660 Ross Gayler 2021-01-12 Add 01-4 check demog
html cb9bf70 Ross Gayler 2021-01-12 Build site.
Rmd 84d53a0 Ross Gayler 2021-01-12 Add 01-3 check resid
html 6edddf5 Ross Gayler 2021-01-12 Build site.
Rmd 6469262 Ross Gayler 2021-01-12 Add 01-2 check admin
html 6469262 Ross Gayler 2021-01-12 Add 01-2 check admin
html 4a8c170 Ross Gayler 2021-01-10 Build site.
Rmd 9c13ca8 Ross Gayler 2021-01-10 wflow_publish(“analysis/index.Rmd”)
html 03ad324 Ross Gayler 2021-01-05 Build site.
Rmd b03a2c1 Ross Gayler 2021-01-05 wflow_publish(c(“analysis/index.Rmd”, “analysis/notes.Rmd”))
html 80b360b Ross Gayler 2021-01-04 Build site.
html 856a513 Ross Gayler 2021-01-04 Build site.
html 838463a Ross Gayler 2020-12-23 Build site.
html a618d9e Ross Gayler 2020-12-23 Build site.
Rmd c6390cc Ross Gayler 2020-12-23 wflow_publish("analysis/*.Rmd")
html 36ccc82 Ross Gayler 2020-12-13 Build site.
Rmd f0a165a Ross Gayler 2020-12-13 End of day
html d5eb60b Ross Gayler 2020-12-10 Build site.
html 01b669c Ross Gayler 2020-12-10 Build site.
html 1993afa Ross Gayler 2020-12-10 Build site.
Rmd 9ea6d8a Ross Gayler 2020-12-10 Fix figure captions
html bc8c1cc Ross Gayler 2020-12-06 Build site.
Rmd c99ceff Ross Gayler 2020-12-06 First draft of proposal
html 2f9886a Ross Gayler 2020-12-05 Build site.
html 5f37c79 Ross Gayler 2020-11-30 Build site.
Rmd c2e37f3 Ross Gayler 2020-11-30 Initial ndex.Rmd
html c2e37f3 Ross Gayler 2020-11-30 Initial ndex.Rmd
Rmd 2a722d0 Ross Gayler 2020-11-29 end of day
html 2a722d0 Ross Gayler 2020-11-29 end of day
html 03b0a02 Ross Gayler 2020-11-04 Build site.
Rmd e163b3b Ross Gayler 2020-11-04 Start workflowr project.

This is the website for the research project “Frequency-Aware Similarity Calibration”.

If you have cloned the project to a local computer this website is rendered in the docs subdirectory of the project directory.

If you are using workflowr to publish the research website it will also be rendered online to GitHub Pages.

This page acts as a table of contents for the website. There are links to the web pages generated from the analysis notebooks and to the rendered versions of manuscripts/documents/presentations.


Overview documents

Proposal

This notebook explains the central ideas behind the project.

Notes

This notebook is for keeping notes of any points that may be useful for later project or manuscript development and which are not covered in the analysis notebooks or at risk of getting lost in the notebooks.

Workflow management

This project uses the targets and workflowr packages for managing the workflow of the project (making sure that the dependencies between computational steps are satisfied). When this work was started there were no easily found examples of using targets and workflowr together. This notebook contains notes on the proposed workflow for using targets and workflowr.


Manuscripts

Links to rendered manuscripts and presentations will go here.


Analysis Notebooks

01 Read, check, and standardise the entity data

Initial data preparation of imported entity records.

01-1 Get, subset, check, and save data

Import the raw data, cut it back to the subset of rows and columns that are possibly useful, sanity check the data, and save the data in an R-friendly format.

01-2 Check administrative variables

Check the “administrative” variables. This is data relating to the administration of voter registration.

01-3 Check residence variables

Check the residence variables - residential address and phone number.

01-4 Check demographic variables

Check the demographic variables - sex, age, and birth place.

01-5 Check name variables

Check the name variables.

01-6 Clean variables

Clean all the variables.


02 Blocking variables

Examine the distributions of potential blocking variables.

02-1 Characterise blocking variables

Characterise the potential blocking variables and combinations of variables.

02-2 Make blocking variables

Construct the most promising potential combination blocking variables.


03 Name frequency (equality)

Detailed examination of the distributions of name frequencies induced by the string equality relation.


04 Name frequency (similarity)

Detailed examination of the distributions of name frequencies induced by a string similarity relation.


05 Similarity calibration

Detailed examination of the calibration from similarity to probability of identity match, both unconditionally and as a function of name frequency.


06 Compatibility models

Estimate multivariate compatibility models.