Last updated: 2021-01-01

Checks: 7 0

Knit directory: fa_sim_cal/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20201104) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 47fd315. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .tresorit/
    Ignored:    data/VR_20051125.txt.xz
    Ignored:    output/d.fst
    Ignored:    renv/library/
    Ignored:    renv/staging/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/01_get_check_data.Rmd) and HTML (docs/01_get_check_data.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 47fd315 Ross Gayler 2021-01-01 wflow_publish("analysis/01*.Rmd")
Rmd 73eb6b5 Ross Gayler 2020-12-28 end of day
Rmd 46eb294 Ross Gayler 2020-12-26 Fix stupid merge conflict
Rmd c2517e7 Ross Gayler 2020-12-26 end of day
html c2517e7 Ross Gayler 2020-12-26 end of day
Rmd 3c6c7ff Ross Gayler 2020-12-25 end of day
html 3c6c7ff Ross Gayler 2020-12-25 end of day
html 838463a Ross Gayler 2020-12-23 Build site.
html a618d9e Ross Gayler 2020-12-23 Build site.
Rmd c6390cc Ross Gayler 2020-12-23 wflow_publish("analysis/*.Rmd")
Rmd 01b669c Ross Gayler 2020-12-10 Build site.
Rmd bbb7d9d Ross Gayler 2020-12-07 End of day
Rmd babb874 Ross Gayler 2020-12-06 End of day

library(here)
here() starts at /home/ross/RG/projects/academic/entity_resolution/fa_sim_cal_TOP/fa_sim_cal
library(magrittr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(stringr)
library(vroom)
library(skimr)
library(knitr)
library(glue)

Attaching package: 'glue'
The following object is masked from 'package:dplyr':

    collapse

1 Introduction

Read the data, characterise it to understand it, and check for possible gotchas.

This project uses historical voter registration data from the North Carolina State Board of Elections. This information is made publicly available in accordance with North Carolina state law. The Voter Registration Data page links to a folder of Voter Registration snapshots, which contains the snapshot data files and a metadata file describing the layout of the snapshot data files. At the time of writing the snapshot files cover the years 2005 to 2020 with at least one snapshot per year. The files are ZIP compressed and relatively large, with the smallest being 572 MB after compression.

The snapshots contains many columns that are irrelevant to this project and/or prohibited under Australian privacy law (e.g. political affiliation, race). We initially read all the columns, because that may help debugging the inevitable problems reading the data. Later the data set will be restricted to the essential columns for the project.

We use only one snapshot file (VR_Snapshot_20051125.zip) because this project does not investigate linkage of records across time. We chose the oldest snapshot (2005) because it is the smallest and the contents are the most out of date, minimising the current information made available. Note that this project will not generate any information that is not already directly, publicly available from NCSBE.

2 Read data

The snapshot ZIP file was downloaded, uncompressed (5.7 GB), then compressed in XZ format to minimise the size. The compressed snapshot file and the metadata file are stored in the data directory.

raw_file <- here::here("data", "VR_20051125.txt.xz") # raw input file

The cleaned data is stored as an fst format file in the output directory.

d_fst <- here::here("output", "d.fst") # temporary data file
clean_fst <- here::here("output", "clean.fst") # parsed and cleaned data as a dataframe

The data is tab-separated, not fixed-width as you might reasonably think from reading the metadata. The field widths (interpreted as maximum lengths) in the metadata are not accurate. Some fields contain values longer than the stated width.

Inspection of the raw data shows that the character fields are unquoted. However, at least one character value contains a double-quote character, which has the potential to confuse the parsing if it is looking for quoted values.

d <- vroom::vroom( #read raw data; let vroom guess the field types
  raw_file,
  delim = "\t", # assume that fields are *only* delimited by tabs
  col_names = TRUE, # use the column names on the first line of data
  na = "", # missing fields are empty string or whitespace only (see trim_ws argument)
  quote = "", # don't allow for quoted strings
  comment = "", # don't allow for comments
  trim_ws = TRUE, # trim leading and trailing whitespace
  escape_double = FALSE, # assume no escaped quotes
  escape_backslash = FALSE # assume no escaped backslashes
  )
fst::write_fst(d, d_fst, compress = 100) # save data frame (cheap-skate caching)

Some of the analyses have been done on a laptop with 16GB of RAM. The data set is almost too big for that laptop, so for different sections of the analysis I read a subset of the columns from the temporary data file, delete the dataframes after use and clean up the RAM with a garbage collection.

d <- fst::read_fst(d_fst) %>% tibble::as_tibble() # get cached data
dim(d)
[1] 8003293      90
  • Correct number of data rows extracted (external line count of input file = 8,003,294)

3 Characterise data (all records)

Take a very quick look at everything then concentrate on the columns that have a chance of being useful.

glimpse(d)
Rows: 8,003,293
Columns: 90
$ snapshot_dt              <dttm> 2005-11-25, 2005-11-25, 2005-11-25, 2005-11…
$ county_id                <dbl> 18, 7, 10, 16, 58, 60, 62, 73, 74, 87, 99, 3…
$ county_desc              <chr> "CATAWBA", "BEAUFORT", "BRUNSWICK", "CARTERE…
$ voter_reg_num            <chr> "0", "000000000000", "000000000000", "000000…
$ ncid                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ status_cd                <chr> "R", "R", "R", "R", "R", "R", "R", "R", "R",…
$ voter_status_desc        <chr> "REMOVED", "REMOVED", "REMOVED", "REMOVED", …
$ reason_cd                <chr> "RL", "R2", "R2", "RP", "R2", "RL", "RP", "R…
$ voter_status_reason_desc <chr> "MOVED FROM COUNTY", "DUPLICATE", "DUPLICATE…
$ absent_ind               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ name_prefx_cd            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ last_name                <chr> "AARON", "THOMPSON", "WILSON", "LANGSTON", "…
$ first_name               <chr> "CHARLES", "JESSICA", "WILLIAM", "VON", "LIZ…
$ midl_name                <chr> "F", "RUTH", "B", NA, "IRENE", "R", "HUGHES"…
$ name_sufx_cd             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ house_num                <dbl> 0, 961, 0, 264, 1536, 1431, 171, 0, 0, 1000,…
$ half_code                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ street_dir               <chr> NA, NA, NA, NA, NA, "E", NA, NA, NA, NA, NA,…
$ street_name              <chr> "ROUTE 4", "TAYLOR", "MIRROR LAKE", "CARL GA…
$ street_type_cd           <chr> NA, "RD", NA, "RD", "RD", "ST", NA, NA, NA, …
$ street_sufx_cd           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ unit_designator          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ unit_num                 <chr> "147 BA", NA, NA, NA, NA, "1", NA, NA, NA, N…
$ res_city_desc            <chr> "CONOVER", "CHOCOWINITY", "BOILING SPRING LA…
$ state_cd                 <chr> "NC", "NC", "NC", "NC", "NC", "NC", "NC", NA…
$ zip_code                 <dbl> 28613, 27817, 28461, 28570, 27892, 28204, 27…
$ mail_addr1               <chr> NA, "619A FOUNDERS HALL, CP0 # 9100", NA, NA…
$ mail_addr2               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mail_addr3               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mail_addr4               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ mail_city                <chr> NA, "ASHEVILLE", NA, NA, NA, NA, "CANDOR", N…
$ mail_state               <chr> NA, "NC", NA, NA, NA, NA, "NC", NA, NA, NA, …
$ mail_zipcode             <dbl> NA, 0, NA, NA, NA, NA, 27229, NA, NA, NA, NA…
$ area_cd                  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ phone_num                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ race_code                <chr> "W", "W", "U", "B", "W", "W", "W", "U", "U",…
$ race_desc                <chr> "WHITE", "WHITE", "UNDESIGNATED", "BLACK or …
$ ethnic_code              <chr> "NL", "NL", "NL", "NL", "NL", "NL", "NL", "N…
$ ethnic_desc              <chr> "NOT HISPANIC or NOT LATINO", "NOT HISPANIC …
$ party_cd                 <chr> "REP", "REP", "UNA", "DEM", "REP", "UNA", "D…
$ party_desc               <chr> "REPUBLICAN", "REPUBLICAN", "UNAFFILIATED", …
$ sex_code                 <chr> "M", "F", "U", "M", "F", "F", "M", "U", "U",…
$ sex                      <chr> "MALE", "FEMALE", "UNK", "MALE", "FEMALE", "…
$ age                      <dbl> 62, 26, 0, 58, 63, 30, 93, 0, 0, 82, 57, 72,…
$ birth_place              <chr> NA, "NC", NA, "MI", NA, "VA", "NC", NA, NA, …
$ registr_dt               <dttm> 1984-10-06, 2000-07-31, 1900-01-01, 1978-04…
$ precinct_abbrv           <chr> NA, "CHOCO", NA, NA, NA, NA, NA, NA, NA, "BC…
$ precinct_desc            <chr> NA, "CHOCOWINITY", NA, NA, NA, NA, NA, NA, N…
$ municipality_abbrv       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "JNV…
$ municipality_desc        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "JON…
$ ward_abbrv               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ ward_desc                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ cong_dist_abbrv          <chr> NA, "01", NA, NA, NA, NA, NA, NA, NA, "11", …
$ cong_dist_desc           <chr> NA, "1ST CONGRESS", NA, NA, NA, NA, NA, NA, …
$ super_court_abbrv        <chr> NA, "02", NA, NA, NA, NA, NA, NA, NA, "30A",…
$ super_court_desc         <chr> NA, "2ND SUPERIOR COURT", NA, NA, NA, NA, NA…
$ judic_dist_abbrv         <chr> NA, "02", NA, NA, NA, NA, NA, NA, NA, "30", …
$ judic_dist_desc          <chr> NA, "2ND JUDICIAL", NA, NA, NA, NA, NA, NA, …
$ NC_senate_abbrv          <chr> NA, "01", NA, NA, NA, NA, NA, NA, NA, "50", …
$ NC_senate_desc           <chr> NA, "1ST SENATE", NA, NA, NA, NA, NA, NA, NA…
$ NC_house_abbrv           <chr> NA, "006", NA, NA, NA, NA, NA, NA, NA, "119"…
$ NC_house_desc            <chr> NA, "6TH HOUSE", NA, NA, NA, NA, NA, NA, NA,…
$ county_commiss_abbrv     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ county_commiss_desc      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ township_abbrv           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ township_desc            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ school_dist_abbrv        <chr> NA, "SD2", NA, NA, NA, NA, NA, NA, NA, NA, N…
$ school_dist_desc         <chr> NA, "SCHOOL #2", NA, NA, NA, NA, NA, NA, NA,…
$ fire_dist_abbrv          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ fire_dist_desc           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ water_dist_abbrv         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ water_dist_desc          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ sewer_dist_abbrv         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ sewer_dist_desc          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ sanit_dist_abbrv         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ sanit_dist_desc          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ rescue_dist_abbrv        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ rescue_dist_desc         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ munic_dist_abbrv         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ munic_dist_desc          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ dist_1_abbrv             <chr> NA, "02", NA, NA, NA, NA, NA, NA, NA, "30", …
$ dist_1_desc              <chr> NA, "2ND PROSECUTORIAL", NA, NA, NA, NA, NA,…
$ dist_2_abbrv             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ dist_2_desc              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ confidential_ind         <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N",…
$ cancellation_dt          <dttm> NA, 2001-07-06, 2001-02-05, NA, 2001-03-15,…
$ vtd_abbrv                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ vtd_desc                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ load_dt                  <dttm> 2014-07-15 22:21:54, 2014-07-15 22:21:54, 2…
$ age_group                <chr> "41 TO 65", "26 TO 40", "UNKNOWN", "41 TO 65…
skimr::skim(d)
Warning in grepl("^\\s+$", x): input string 3907396 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 3975334 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 388213 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 503879 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 817815 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 7446786 is invalid in this locale
Warning in grepl("^\\s+$", x): input string 7446791 is invalid in this locale
Table 3.1: Data summary
Name d
Number of rows 8003293
Number of columns 90
_______________________
Column type frequency:
character 59
logical 20
numeric 7
POSIXct 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
county_desc 0 1.00 3 12 0 100 0
voter_reg_num 0 1.00 1 12 0 2708878 0
status_cd 2 1.00 1 1 0 5 0
voter_status_desc 2 1.00 6 22 0 5 0
reason_cd 238 1.00 2 2 0 26 0
voter_status_reason_desc 238 1.00 8 56 0 26 0
last_name 122 1.00 1 23 0 269312 0
first_name 254 1.00 1 19 0 176806 0
midl_name 553015 0.93 1 20 0 249768 0
name_sufx_cd 7561920 0.06 1 3 0 222 0
street_dir 7409655 0.07 1 2 0 15 0
street_name 7768 1.00 1 30 0 122064 0
street_type_cd 527462 0.93 1 4 0 215 0
street_sufx_cd 7698925 0.04 1 3 0 15 0
unit_num 7020919 0.12 1 7 0 32785 0
res_city_desc 3750 1.00 3 20 0 856 0
state_cd 7277 1.00 1 2 0 20 0
mail_addr1 6814780 0.15 1 40 0 421307 0
mail_city 6819798 0.15 1 30 0 4168 0
mail_state 6819868 0.15 1 2 0 104 0
phone_num 5370357 0.33 1 7 0 1539509 0
race_code 0 1.00 1 1 0 7 0
race_desc 0 1.00 5 34 0 7 0
ethnic_code 0 1.00 2 2 0 3 0
ethnic_desc 0 1.00 12 26 0 3 0
party_cd 0 1.00 3 3 0 4 0
party_desc 0 1.00 10 13 0 5 0
sex_code 0 1.00 1 1 0 3 0
sex 0 1.00 3 6 0 3 0
birth_place 1716730 0.79 2 2 0 56 0
precinct_abbrv 1865111 0.77 1 6 0 1867 0
precinct_desc 1865111 0.77 2 30 0 2686 0
municipality_abbrv 4396616 0.45 1 4 0 429 0
municipality_desc 4396616 0.45 4 26 0 571 0
ward_abbrv 6116249 0.24 1 4 0 197 0
ward_desc 6116249 0.24 1 28 0 256 0
cong_dist_abbrv 1865114 0.77 2 2 0 13 0
cong_dist_desc 1865114 0.77 2 27 0 46 0
super_court_abbrv 1872590 0.77 2 4 0 68 0
super_court_desc 1872590 0.77 2 30 0 78 0
judic_dist_abbrv 1872576 0.77 2 3 0 40 0
judic_dist_desc 1872576 0.77 2 23 0 54 0
NC_senate_abbrv 1836472 0.77 2 2 0 50 0
NC_senate_desc 1836472 0.77 6 24 0 63 0
NC_house_abbrv 1829345 0.77 3 3 0 120 0
NC_house_desc 1829345 0.77 6 25 0 125 0
county_commiss_abbrv 4365150 0.45 1 4 0 126 0
county_commiss_desc 4365150 0.45 2 30 0 131 0
township_abbrv 6760420 0.16 1 4 0 119 0
township_desc 6760420 0.16 1 27 0 223 0
school_dist_abbrv 3380612 0.58 1 7 0 140 0
school_dist_desc 3380612 0.58 2 30 0 145 0
fire_dist_abbrv 7650404 0.04 1 4 0 82 0
fire_dist_desc 7650404 0.04 5 27 0 107 0
rescue_dist_desc 7885291 0.01 10 16 0 13 0
dist_1_abbrv 1865111 0.77 2 3 0 39 0
dist_1_desc 1865111 0.77 2 27 0 51 0
confidential_ind 0 1.00 1 1 0 2 0
age_group 0 1.00 7 12 0 6 0

Variable type: logical

skim_variable n_missing complete_rate mean count
ncid 8003293 0 NaN :
absent_ind 8003293 0 NaN :
name_prefx_cd 8003293 0 NaN :
half_code 8002085 0 0.38 FAL: 752, TRU: 456
unit_designator 8003293 0 NaN :
mail_addr2 8003292 0 1.00 TRU: 1
mail_addr3 8003293 0 NaN :
mail_addr4 8003293 0 NaN :
water_dist_abbrv 7998651 0 1.00 TRU: 4642
water_dist_desc 8000971 0 1.00 TRU: 2322
sewer_dist_abbrv 8002465 0 1.00 TRU: 828
sewer_dist_desc 8003293 0 NaN :
sanit_dist_abbrv 7997607 0 0.11 FAL: 5069, TRU: 617
sanit_dist_desc 8003293 0 NaN :
munic_dist_abbrv 8002280 0 1.00 TRU: 1013
munic_dist_desc 8002280 0 1.00 TRU: 1013
dist_2_abbrv 8003293 0 NaN :
dist_2_desc 8003293 0 NaN :
vtd_abbrv 8003293 0 NaN :
vtd_desc 8003293 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
county_id 0 1.00 51.96 27.31 1 32 51 74 100 ▅▇▇▆▆
house_num 0 1.00 2664.17 706533.11 0 210 900 3032 1400000000 ▇▁▁▁▁
zip_code 17957 1.00 30806.46 890299.61 0 27523 28027 28401 289309205 ▇▁▁▁▁
mail_zipcode 6819826 0.15 24463505.17 78280243.02 -27379 27812 28345 28699 987725001 ▇▁▁▁▁
area_cd 5621640 0.30 696.09 259.80 -83 336 828 910 999 ▁▃▁▂▇
age 0 1.00 48.71 21.28 0 34 46 60 7644 ▇▁▁▁▁
rescue_dist_abbrv 7885291 0.01 47.54 10.66 12 41 54 55 88 ▁▃▇▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
snapshot_dt 0 1.00 2005-11-25 00:00:00 2005-11-25 00:00:00 2005-11-25 00:00:00 1
registr_dt 0 1.00 1805-08-01 00:00:00 9999-10-21 00:00:00 1995-02-22 00:00:00 75089
cancellation_dt 6240946 0.22 1988-12-06 00:00:00 2005-11-23 00:00:00 2003-01-13 00:00:00 3975
load_dt 0 1.00 2014-07-15 22:21:54 2014-07-15 22:21:54 2014-07-15 22:21:54 1
  • The warning messages from skim() indicate that a handful of rows contain unexpected characters. If they are in rows we use they will have to be located and dealt with.
# clean up
rm(d)
gc()
          used (Mb) gc trigger   (Mb)   max used   (Mb)
Ncells  916882 49.0   10539724  562.9    7542601  402.9
Vcells 5794339 44.3  808709159 6170.0 1010709607 7711.2
# get data for next section of analyses
d <- fst::read_fst(
  d_fst,
  columns = c("county_id", "county_desc", "voter_reg_num", "ncid", "status_cd", 
              "voter_status_desc", "reason_cd", "voter_status_reason_desc")
  ) %>% 
  tibble::as_tibble()
dim(d)
[1] 8003293       8

3.1 county_id & county_desc

county_id: County identification number
county_desc: County description

summary(d$county_id)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   32.00   51.00   51.96   74.00  100.00 
table(d$county_id)

     1      2      3      4      5      6      7      8      9     10     11 
142978  32527  10606  21692  23969  19839  41680  17955  30594  83782 196729 
    12     13     14     15     16     17     18     19     20     21     22 
 76685 129792  68065   8558  64968  19521 158203  45262  27093  11937  10507 
    23     24     25     26     27     28     29     30     31     32     33 
 74946  49306  98242 250411  20356  33701 124906  27493  34816 324683  51893 
    34     35     36     37     38     39     40     41     42     43     44 
350882  41914 140041   8765   8810  40697  15672 473739  43500  74559  63376 
    45     46     47     48     49     50     51     52     53     54     55 
101988  20477  29683   5004 103777  39634 111748  10102  40144  50822  62544 
    56     57     58     59     60     61     62     63     64     65     66 
 31436  19721  24684  38463 697897  15768  24296  67268  81129 185852  17200 
    67     68     69     70     71     72     73     74     75     76     77 
106315 227603  15232  40871  40743  10037  30596 179177  23721 107895  36649 
    78     79     80     81     82     83     84     85     86     87     88 
 90736  87196 115178  49978  45743  28494  52563  37016  53848  14744  38191 
    89     90     91     92     93     94     95     96     97     98     99 
  3445 122676  39355 678226  19534  15399  59440  81209  55298  70609  31275 
   100 
 19014 
  • Never missing
  • Integer 1 .. 100
table(d$county_desc)

    ALAMANCE    ALEXANDER    ALLEGHANY        ANSON         ASHE        AVERY 
      142978        32527        10606        21692        23969        19839 
    BEAUFORT       BERTIE       BLADEN    BRUNSWICK     BUNCOMBE        BURKE 
       41680        17955        30594        83782       196729        76685 
    CABARRUS     CALDWELL       CAMDEN     CARTERET      CASWELL      CATAWBA 
      129792        68065         8558        64968        19521       158203 
     CHATHAM     CHEROKEE       CHOWAN         CLAY    CLEVELAND     COLUMBUS 
       45262        27093        11937        10507        74946        49306 
      CRAVEN   CUMBERLAND    CURRITUCK         DARE     DAVIDSON        DAVIE 
       98242       250411        20356        33701       124906        27493 
      DUPLIN       DURHAM    EDGECOMBE      FORSYTH     FRANKLIN       GASTON 
       34816       324683        51893       350882        41914       140041 
       GATES       GRAHAM    GRANVILLE       GREENE     GUILFORD      HALIFAX 
        8765         8810        40697        15672       473739        43500 
     HARNETT      HAYWOOD    HENDERSON     HERTFORD         HOKE         HYDE 
       74559        63376       101988        20477        29683         5004 
     IREDELL      JACKSON     JOHNSTON        JONES          LEE       LENOIR 
      103777        39634       111748        10102        40144        50822 
     LINCOLN        MACON      MADISON       MARTIN     MCDOWELL  MECKLENBURG 
       62544        31436        19721        24684        38463       697897 
    MITCHELL   MONTGOMERY        MOORE         NASH  NEW HANOVER  NORTHAMPTON 
       15768        24296        67268        81129       185852        17200 
      ONSLOW       ORANGE      PAMLICO   PASQUOTANK       PENDER   PERQUIMANS 
      106315       227603        15232        40871        40743        10037 
      PERSON         PITT         POLK     RANDOLPH     RICHMOND      ROBESON 
       30596       179177        23721       107895        36649        90736 
  ROCKINGHAM        ROWAN   RUTHERFORD      SAMPSON     SCOTLAND       STANLY 
       87196       115178        49978        45743        28494        52563 
      STOKES        SURRY        SWAIN TRANSYLVANIA      TYRRELL        UNION 
       37016        53848        14744        38191         3445       122676 
       VANCE         WAKE       WARREN   WASHINGTON      WATAUGA        WAYNE 
       39355       678226        19534        15399        59440        81209 
      WILKES       WILSON       YADKIN       YANCEY 
       55298        70609        31275        19014 
  • Never missing
  • 100 unique values

They look reasonable, to the extent that I can tell without knowing anything about the counties.

3.2 voter_reg_num

voter_reg_num: Voter registration number (unique by county)

table(d$voter_reg_num) %>% head(12)

           0 000000000000 000000000001 000000000002 000000000003 000000000004 
           1           10           56           64           65           66 
000000000005 000000000006 000000000007 000000000008 000000000009 000000000010 
          61           65           70           64           75           71 
table(d$voter_reg_num) %>% tail(12)

000999834828 000999834834 000999834837 000999834845 000999834860 000999834869 
           1            1            1            1            1            1 
000999834879 000999834883 000999834884 000999834888 000999834892 000999834900 
           1            1            1            1            1            1 
summary(as.integer(d$voter_reg_num))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
        0     36265    155221   5459965   3039980 999834900 
d$voter_reg_num %>% stringr::str_length() %>% table(useNA = "ifany")
.
      1      12 
      1 8003292 
  • ~2.7M unique values
  • Never missing
  • Integer 0 .. ~1,000M (as strings)
  • Looks like they should be 12-digit integers with leading zeroes
  • Exactly one observation is short

Look at the record with the short value.

d %>% 
  dplyr::filter(stringr::str_length(voter_reg_num) < 12) %>% 
  dplyr::select(county_id, voter_reg_num, status_cd, voter_status_desc, reason_cd, voter_status_reason_desc) %>% 
  knitr::kable()
county_id voter_reg_num status_cd voter_status_desc reason_cd voter_status_reason_desc
18 0 R REMOVED RL MOVED FROM COUNTY
  • There is only one short value which can be ignored because it will later be excluded from the data set because of the observation’s status -not active. (I intend to later restrict the data set to only active voters because, to the greatest extent possible, I want to have no duplicate records in the data used for the analyses.)

Check whether county_id x voter_reg_num is unique, as claimed.

d %>% 
  dplyr::select(county_id, voter_reg_num) %>% 
  dplyr::mutate(id = stringr::str_c(as.character(county_id), ".", voter_reg_num)) %>% 
  dplyr::count(id) %>% 
  with(table(n))
n
      1 
8003293 
  • county_id x voter_reg_num is unique, even including observations flagged as duplicates.

3.3 ncid

ncid: North Carolina identification number (NCID) of voter

  • Always missing

That’s a shame. It would have been useful.

3.4 status_cd & voter_status_desc

status_cd: Status code for voter registration
voter_status_desc: Status code description

table(d$status_cd, useNA = "always")

      A       D       I       R       S    <NA> 
4914521   41348  495603 2546485    5334       2 
table(d$voter_status_desc, useNA = "always")

                ACTIVE                 DENIED               INACTIVE 
               4914521                  41348                 495603 
               REMOVED TEMPORARY REGISTRATION                   <NA> 
               2546485                   5334                      2 
  • 5 unique nonmissing values
  • 2 records with missing values
  • ~4.9M active records

3.5 reason_cd & voter_status_reason_desc

reason_cd: Reason code for voter registration status
voter_status_reason_desc: Reason code description

table(d$reason_cd, useNA = "always")

     A1      A2      AA      AL      AN      AP      AV      DI      DU      IL 
  13737   71296      50  523899    7517  198333 4100220    6991   34357   10585 
     IN      IU      R2      RA      RC      RD      RF      RL      RM      RP 
 181320  303197   78951   59008     662  443486   63501  888056  551073  367511 
     RQ      RS      RT      SM      SO      SP    <NA> 
   4194   89049     729    3975    1307      51     238 
table(d$voter_status_reason_desc, useNA = "always")

                                          ADMINISTRATIVE 
                                                   59008 
                                            ARMED FORCES 
                                                      50 
                               CONFIRMATION NOT RETURNED 
                                                  181320 
                                    CONFIRMATION PENDING 
                                                   71296 
                     CONFIRMATION RETURNED UNDELIVERABLE 
                                                  303197 
                                                DECEASED 
                                                  443486 
                                               DUPLICATE 
                                                   78951 
                                       FELONY CONVICTION 
                                                   63501 
                                     LEGACY - CONVERSION 
                                                   10585 
                                             LEGACY DATA 
                                                  523899 
                                                MILITARY 
                                                    3975 
                                       MOVED FROM COUNTY 
                                                  888056 
                                        MOVED FROM STATE 
                                                   89049 
                                        OVERSEAS CITIZEN 
                                                    1307 
                                   PREVIOUSLY REGISTERED 
                                                      51 
REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS 
                                                  551073 
                      REMOVED DUE TO SUSTAINED CHALLENGE 
                                                     662 
                             REMOVED UNDER OLD PURGE LAW 
                                                  367511 
                                      REQUEST FROM VOTER 
                                                    4194 
                                    TEMPORARY REGISTRANT 
                                                     729 
                       UNAVAILABLE ESSENTIAL INFORMATION 
                                                    6991 
                                              UNVERIFIED 
                                                   13737 
                                          UNVERIFIED NEW 
                                                    7517 
                                    VERIFICATION PENDING 
                                                  198333 
                     VERIFICATION RETURNED UNDELIVERABLE 
                                                   34357 
                                                VERIFIED 
                                                 4100220 
                                                    <NA> 
                                                     238 
  • 26 unique nonmissing values
  • 238 records with missing values
  • ~4.1M verified records

Look at the relationship between status and status reason.

table(
  stringr::str_trunc(d$voter_status_reason_desc, 25), 
  stringr::str_trunc(d$voter_status_desc, 8), 
  useNA = "always"
)
                           
                             ACTIVE  DENIED INACTIVE REMOVED TEMPO...    <NA>
  ADMINISTRATIVE                  0       0        0   59008        0       0
  ARMED FORCES                   50       0        0       0        0       0
  CONFIRMATION NOT RETURNED       0       0   181320       0        0       0
  CONFIRMATION PENDING        71295       0        0       1        0       0
  CONFIRMATION RETURNED ...       0       0   303197       0        0       0
  DECEASED                        0       0        0  443486        0       0
  DUPLICATE                       0       0        0   78951        0       0
  FELONY CONVICTION               0       0        0   63501        0       0
  LEGACY - CONVERSION             1       0    10584       0        0       0
  LEGACY DATA                523897       0        2       0        0       0
  MILITARY                        0       0        0       0     3975       0
  MOVED FROM COUNTY               0       0        0  888055        0       1
  MOVED FROM STATE                0       0        0   89049        0       0
  OVERSEAS CITIZEN                0       0        0       0     1307       0
  PREVIOUSLY REGISTERED           0       0        0       1       50       0
  REMOVED AFTER 2 FED GE...       0       0        0  551072        0       1
  REMOVED DUE TO SUSTAIN...       0       0        0     662        0       0
  REMOVED UNDER OLD PURG...       0       0        0  367511        0       0
  REQUEST FROM VOTER              0       0        0    4194        0       0
  TEMPORARY REGISTRANT            0       0        0     729        0       0
  UNAVAILABLE ESSENTIAL ...       0    6990        0       1        0       0
  UNVERIFIED                  13731       0        0       4        2       0
  UNVERIFIED NEW               7516       0        0       1        0       0
  VERIFICATION PENDING       198331       0        1       1        0       0
  VERIFICATION RETURNED ...       0   34357        0       0        0       0
  VERIFIED                  4099700       1      499      20        0       0
  <NA>                            0       0        0     238        0       0
  • voter_status_desc == “ACTIVE” & voter_status_reason_desc == “VERIFIED”

    • Most likely to be error free (based on common-sense interpretation of the labels)
    • ~4.1M observations

3.6 Name standardisation

Identify any oddities about the name fields that might benefit from standardisation.

I will do this on all the rows, not just the subset to be analysed, because I expect the oddities to be much the same independently of whether I will exclude the rows from the analyses and the larger sample size will be helpful in spotting rare problems.

I will look at the three name fields concurrently because I expect the oddities to be similar across the name fields.

  • last_name: Voter last name
  • first_name: Voter first name
  • midl_name: Voter middle name

Look for possible anomalies in names.

# clean up
rm(d)
gc()
           used (Mb) gc trigger   (Mb)   max used   (Mb)
Ncells   924143 49.4   32963065 1760.5   41203831 2200.6
Vcells 10006560 76.4  414059091 3159.1 1010709607 7711.2
# get data for next section of analyses
d <- fst::read_fst(
  d_fst,
  columns = c(
    "last_name", "first_name", "midl_name", "name_sufx_cd", 
    "sex", "age", "voter_status_desc", "voter_status_reason_desc"
  )
) %>% 
  tibble::as_tibble()
dim(d)
[1] 8003293       8

3.6.1 Name missing

d %>% with(table(is.na(last_name)))

  FALSE    TRUE 
8003171     122 
d %>% with(table(is.na(first_name)))

  FALSE    TRUE 
8003039     254 
d %>% with(table(is.na(midl_name)))

  FALSE    TRUE 
7450278  553015 
  • A small fraction of last and first names are missing. We don’t expect them to be missing.
  • A significant fraction of middle names are missing. This is expected as middle names are not mandatory.

Look at the records missing last or first names to see if there is some explanation for their absence.

# last name missing
d %>% 
  dplyr::filter(is.na(last_name)) %>% 
  dplyr::select(
    first_name, midl_name, name_sufx_cd, 
    sex, age,
    voter_status_desc, voter_status_reason_desc
    ) %>% 
  dplyr::arrange(voter_status_desc, voter_status_reason_desc, first_name) %>% 
  knitr::kable()
first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
CHRISTINA GAYLE NA FEMALE 27 REMOVED ADMINISTRATIVE
STEPHANIE ELISE NA FEMALE 25 REMOVED ADMINISTRATIVE
WILLIAM TODD NA MALE 41 REMOVED ADMINISTRATIVE
A J NA FEMALE 94 REMOVED DECEASED
ALBERT FREEMAN NA MALE 82 REMOVED DECEASED
BROUNDA KAY NA FEMALE 58 REMOVED DECEASED
CLARENCE EDWARD NA MALE 85 REMOVED DECEASED
COLON WALTER NA MALE 71 REMOVED DECEASED
ELOISE L NA FEMALE 0 REMOVED DECEASED
GENE EDWARD NA MALE 74 REMOVED DECEASED
HELEN KOOPS NA FEMALE 89 REMOVED DECEASED
JAMES A NA MALE 75 REMOVED DECEASED
JAMES EARL NA MALE 69 REMOVED DECEASED
JOHN ROBERT NA MALE 87 REMOVED DECEASED
MARTHA BOATRIGHT NA FEMALE 77 REMOVED DECEASED
MELISSA O NA FEMALE 39 REMOVED DECEASED
VERA M NA FEMALE 76 REMOVED DECEASED
VOLA B NA FEMALE 98 REMOVED DECEASED
CHARLES EMMETT NA MALE 73 REMOVED DUPLICATE
FANNIE N NA FEMALE 77 REMOVED DUPLICATE
PATRICIA C NA FEMALE 75 REMOVED DUPLICATE
PAULINE NA NA FEMALE 56 REMOVED DUPLICATE
ROBERT ERIC NA MALE 40 REMOVED DUPLICATE
VIRGINIA L NA FEMALE 90 REMOVED DUPLICATE
WELDON COX NA MALE 76 REMOVED DUPLICATE
DEONTRAYVIA EMANUEL NA MALE 30 REMOVED FELONY CONVICTION
JANE ANN NA FEMALE 26 REMOVED FELONY CONVICTION
KIM LEE NA MALE 51 REMOVED FELONY CONVICTION
LEANDER WARREN NA MALE 43 REMOVED FELONY CONVICTION
MIKE J NA MALE 51 REMOVED FELONY CONVICTION
SHIRLEY GRIFFIN NA FEMALE 40 REMOVED FELONY CONVICTION
WESLEY WILSON NA MALE 41 REMOVED FELONY CONVICTION
WILLIAM RAY NA MALE 43 REMOVED FELONY CONVICTION
AMY DENISE NA FEMALE 34 REMOVED MOVED FROM COUNTY
ANDREA CROUCH NA FEMALE 35 REMOVED MOVED FROM COUNTY
CAROLYN MOORE NA FEMALE 56 REMOVED MOVED FROM COUNTY
DAVID DEAN NA MALE 38 REMOVED MOVED FROM COUNTY
FREDDA M NA FEMALE 82 REMOVED MOVED FROM COUNTY
JAMES DONALD III MALE 45 REMOVED MOVED FROM COUNTY
JESSIE H NA FEMALE 81 REMOVED MOVED FROM COUNTY
JUDITH A NA FEMALE 44 REMOVED MOVED FROM COUNTY
KATHLEEN LOUISE NA FEMALE 23 REMOVED MOVED FROM COUNTY
KELLY R NA FEMALE 38 REMOVED MOVED FROM COUNTY
LARRY ANTHONY SR MALE 46 REMOVED MOVED FROM COUNTY
LARRY DALLAS NA MALE 63 REMOVED MOVED FROM COUNTY
MARY MOSELEY NA FEMALE 46 REMOVED MOVED FROM COUNTY
MATTHEW JAMES NA MALE 25 REMOVED MOVED FROM COUNTY
MIRANDA MARIE NA FEMALE 23 REMOVED MOVED FROM COUNTY
NATALIE BASSHAM NA FEMALE 32 REMOVED MOVED FROM COUNTY
PATSY D NA FEMALE 50 REMOVED MOVED FROM COUNTY
SHIELA WEST NA FEMALE 57 REMOVED MOVED FROM COUNTY
STELLA NORWOOD NA FEMALE 41 REMOVED MOVED FROM COUNTY
NA NA NA UNK 0 REMOVED MOVED FROM COUNTY
HENRY RAY NA MALE 69 REMOVED MOVED FROM STATE
JASON M NA MALE 35 REMOVED MOVED FROM STATE
L KENT NA MALE 65 REMOVED MOVED FROM STATE
LINDA LOU NA FEMALE 58 REMOVED MOVED FROM STATE
ROBERT CARL NA MALE 56 REMOVED MOVED FROM STATE
ROY W NA MALE 0 REMOVED MOVED FROM STATE
NA NA NA UNK 0 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DUOC VAN DO UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
JEREMY SEAN NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
L F III MALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA 08 UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
  • All the voters missing last_name are REMOVED. Perhaps it’s a side-effect of the removal process.
# first name missing
d %>% 
  dplyr::filter(is.na(first_name)) %>% 
  dplyr::select(
    last_name, midl_name, name_sufx_cd, 
    sex, age,
    voter_status_desc, voter_status_reason_desc
    ) %>% 
  dplyr::arrange(voter_status_desc, voter_status_reason_desc, midl_name) %>% 
  knitr::kable()
last_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
TRIANOSKY SUSAN SMITH NA FEMALE 46 ACTIVE CONFIRMATION PENDING
JOSEY BETTY NA FEMALE 61 ACTIVE LEGACY DATA
PARRISH BRENDA NA FEMALE 59 ACTIVE LEGACY DATA
ROBINSON JACQUELINE P NA FEMALE 39 ACTIVE LEGACY DATA
UNDERWOOD REGINA NA FEMALE 46 ACTIVE LEGACY DATA
JONES LARRY MALLOR NA JR MALE 98 ACTIVE LEGACY DATA
HOLMAN HOWARD NA MALE 41 ACTIVE UNVERIFIED NEW
YABIN NA NA MALE 53 ACTIVE VERIFICATION PENDING
MORRIS ALEXANDER NA MALE 30 ACTIVE VERIFIED
BULLARD ALEXIS NA UNK 19 ACTIVE VERIFIED
ZIMMER CLIFFORD NA MALE 64 ACTIVE VERIFIED
CHESTER JAMES NA UNK 39 ACTIVE VERIFIED
ALEXANDER JASON NA MALE 28 ACTIVE VERIFIED
PATTERSON JOHN DEXTER III MALE 55 ACTIVE VERIFIED
MCKEEL LESTER NA MALE 77 ACTIVE VERIFIED
FRISBY M JR MALE 33 ACTIVE VERIFIED
FUQUA MARY NA FEMALE 59 ACTIVE VERIFIED
MOLET MICHAEL NA MALE 26 ACTIVE VERIFIED
KAUCHICK PAULINE NA FEMALE 26 ACTIVE VERIFIED
FUQUA WILLIAM NA MALE 63 ACTIVE VERIFIED
WARREN NA JD MALE 68 ACTIVE VERIFIED
FRYE WILLIAM C NA II MALE 50 ACTIVE VERIFIED
BURGESS NA NA FEMALE 29 ACTIVE VERIFIED
PHOENIX NA NA FEMALE 45 ACTIVE VERIFIED
JUDITH NA NA FEMALE 50 ACTIVE VERIFIED
MALIK NA NA MALE 33 ACTIVE VERIFIED
ELSASS NA NA MALE 37 ACTIVE VERIFIED
MAGENTA NA NA FEMALE 42 ACTIVE VERIFIED
GRAYWOLF NA NA MALE 57 ACTIVE VERIFIED
AMEN NA NA MALE 41 ACTIVE VERIFIED
SILVERMOON NA NA FEMALE 40 ACTIVE VERIFIED
PELKEY CHARES JR MALE 59 DENIED UNAVAILABLE ESSENTIAL INFORMATION
PITTS DARRYL NA MALE 19 DENIED VERIFICATION RETURNED UNDELIVERABLE
LE SON NA NA UNK 35 DENIED VERIFICATION RETURNED UNDELIVERABLE
WHITFIELD KAY M NA NA FEMALE 79 INACTIVE CONFIRMATION NOT RETURNED
MEDLIN ROBERT E NA FEMALE 0 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
BRICE NA NA MALE 33 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
CAPARCO NA JEN FEMALE 33 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MORRISON NA NA MALE 34 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
BALLARD LEIGH NA FEMALE 29 REMOVED ADMINISTRATIVE
COTTEN NA NA FEMALE 83 REMOVED ADMINISTRATIVE
SMITH NA NA UNK 0 REMOVED ADMINISTRATIVE
0000000072294 NA NA MALE 46 REMOVED ADMINISTRATIVE
ALSBROOKS ELEANOR NA FEMALE 90 REMOVED DECEASED
OXENDINE MITCHEL NA MALE 51 REMOVED DECEASED
WILLIS MOLLIE NA FEMALE 92 REMOVED DECEASED
ELLER RETA KATHLEE NA FEMALE 56 REMOVED DECEASED
CHITTY RUBEN D FEMALE 98 REMOVED DECEASED
SELENE NA NA FEMALE 57 REMOVED DECEASED
LOWRY NA NA MALE 48 REMOVED DECEASED
DE BRAGANZA NA NA MALE 93 REMOVED DECEASED
MIDDLETON C NA FEMALE 84 REMOVED DUPLICATE
BELL JAI-MIL NA FEMALE 23 REMOVED DUPLICATE
HWY LIBERA V MALE 69 REMOVED DUPLICATE
OWENS MICHELLE NA FEMALE 24 REMOVED DUPLICATE
WILTON SUSAN LORRAINE NA FEMALE 50 REMOVED DUPLICATE
ALLRED LINDA H NA NA FEMALE 66 REMOVED DUPLICATE
AMATO,KATHERINE,M NA NA FEMALE 50 REMOVED DUPLICATE
AMIDON,PETER,LEVENT NA NA MALE 33 REMOVED DUPLICATE
BEST,SYDNEY,ALLISON NA NA FEMALE 37 REMOVED DUPLICATE
BETHEA HAROLD LEE NA NA FEMALE 46 REMOVED DUPLICATE
BEVERLY CONSTANCE M NA NA FEMALE 37 REMOVED DUPLICATE
BOOZER ANNA KRISTEN NA NA FEMALE 36 REMOVED DUPLICATE
BOYD,ALLEN AUBREY,II NA NA MALE 35 REMOVED DUPLICATE
BRICE.MICHAEL ARTHUR NA NA MALE 37 REMOVED DUPLICATE
CARR,WENDELL,H JR NA NA MALE 36 REMOVED DUPLICATE
CATHEY,LONNIE,JR NA NA MALE 57 REMOVED DUPLICATE
CLARK JOANNE BENNETT NA NA FEMALE 63 REMOVED DUPLICATE
CUSTER,GEORGE D,JR NA NA MALE 51 REMOVED DUPLICATE
DAVID HYDE JR NA NA MALE 44 REMOVED DUPLICATE
DAVISKMICHAEL EDWARD NA NA MALE 51 REMOVED DUPLICATE
DUBUISSON ALLISON B NA NA FEMALE 53 REMOVED DUPLICATE
FORRIS FAY ANN NA NA FEMALE 39 REMOVED DUPLICATE
FULK,IVEY LEE,JR NA NA MALE 45 REMOVED DUPLICATE
GRIFFIN JANICE FAYE NA NA FEMALE 42 REMOVED DUPLICATE
HALL,PONTHEOLA,M NA NA FEMALE 53 REMOVED DUPLICATE
HANNER JO ANNE LONG NA NA FEMALE 61 REMOVED DUPLICATE
HODNETT,DORGIE,JR NA NA MALE 52 REMOVED DUPLICATE
HOGSHEAD,THOMAS H,JR NA NA MALE 66 REMOVED DUPLICATE
JENKINS,JAMES W,JR NA NA MALE 36 REMOVED DUPLICATE
JONES,JOHNSIE,H NA NA FEMALE 92 REMOVED DUPLICATE
KENNY MAHLON DAY NA NA MALE 84 REMOVED DUPLICATE
KEY,GENE SAMUEL,JR NA NA MALE 44 REMOVED DUPLICATE
LACKEY CAROL M NA NA FEMALE 70 REMOVED DUPLICATE
LAMBERT DAVID M NA NA MALE 43 REMOVED DUPLICATE
LESANE JACQUELINE NA NA FEMALE 35 REMOVED DUPLICATE
MAPP,DWIGHT,BENJAMIN NA NA MALE 57 REMOVED DUPLICATE
MAY ROBERT BRYAN NA NA FEMALE 87 REMOVED DUPLICATE
MCCARTHY LISA ANNE NA NA FEMALE 44 REMOVED DUPLICATE
MICHELMJOSEPH JOHN NA NA MALE 40 REMOVED DUPLICATE
NORTON MYRA WOODELL NA NA FEMALE 63 REMOVED DUPLICATE
PEDIGO BUFORD T NA NA MALE 96 REMOVED DUPLICATE
REDWINE MARK ALAN NA NA MALE 53 REMOVED DUPLICATE
ROUSE,ESTHER, MAE NA NA FEMALE 52 REMOVED DUPLICATE
RUPOLO SANDRA NA NA FEMALE 36 REMOVED DUPLICATE
SIMS,RAYMOND LEE,SR NA NA MALE 66 REMOVED DUPLICATE
URQUHART PARK VASCO NA NA MALE 53 REMOVED DUPLICATE
VALDEZ DONNA A NA NA FEMALE 43 REMOVED DUPLICATE
WALKER,CHARLES,JR NA NA MALE 56 REMOVED DUPLICATE
WESTMORELAND J C NA NA MALE 83 REMOVED DUPLICATE
WHITAKER,JAMES L,JR NA NA MALE 35 REMOVED DUPLICATE
WHITE,LEE E,JR NA NA FEMALE 35 REMOVED DUPLICATE
VAN DORSTEN NA NA FEMALE 105 REMOVED DUPLICATE
BENSON EUGENE NA MALE 60 REMOVED FELONY CONVICTION
STURDIVANT NA NA MALE 0 REMOVED FELONY CONVICTION
STURDIVANT NA NA MALE 0 REMOVED FELONY CONVICTION
BENTON BINARD NA FEMALE 46 REMOVED MOVED FROM COUNTY
JACOBS HUTTO NA FEMALE 29 REMOVED MOVED FROM COUNTY
HOLSHOUSER LOUISE NA FEMALE 23 REMOVED MOVED FROM COUNTY
GREEN LYNN NA FEMALE 42 REMOVED MOVED FROM COUNTY
JOHNSON MICHELLE NA FEMALE 28 REMOVED MOVED FROM COUNTY
BLICK MOORE NA FEMALE 53 REMOVED MOVED FROM COUNTY
MORRISON SAIN NA FEMALE 52 REMOVED MOVED FROM COUNTY
BURGOYNE STEPHANIE A NA FEMALE 55 REMOVED MOVED FROM COUNTY
BARNES VALRIE NA FEMALE 56 REMOVED MOVED FROM COUNTY
FEARS VANDERBILT JR MALE 45 REMOVED MOVED FROM COUNTY
PINION WAYNE NA MALE 63 REMOVED MOVED FROM COUNTY
SKELTON WILLIAM III MALE 40 REMOVED MOVED FROM COUNTY
RAINEY NA NA MALE 0 REMOVED MOVED FROM COUNTY
SKIA NA NA FEMALE 45 REMOVED MOVED FROM COUNTY
NA NA NA UNK 0 REMOVED MOVED FROM COUNTY
MAGENTA NA NA FEMALE 42 REMOVED MOVED FROM COUNTY
DE NA NA MALE 105 REMOVED MOVED FROM COUNTY
DE DEBORAH NA NA FEMALE 105 REMOVED MOVED FROM COUNTY
VAN EATON NA NA MALE 58 REMOVED MOVED FROM COUNTY
TUIT NA NA MALE 21 REMOVED MOVED FROM COUNTY
MARGO (ONLY NA FEMALE 62 REMOVED MOVED FROM STATE
LEWIS BUZBY NA FEMALE 42 REMOVED MOVED FROM STATE
RIVERS-MITCHELL TRINA SAGE NA FEMALE 30 REMOVED MOVED FROM STATE
BURNET UNNI KJOSNES NA FEMALE 72 REMOVED MOVED FROM STATE
HOCUTT CLAVON MORRIS NA NA MALE 58 REMOVED MOVED FROM STATE
SEXTON NA NA FEMALE 53 REMOVED MOVED FROM STATE
ST JOHN NA NA FEMALE 44 REMOVED MOVED FROM STATE
REARDON JOSEPH SR MALE 53 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
LOVE K NA MALE 81 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
MASTON MELISSA CHAN NA FEMALE 34 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
VON LOTHENHEIGER ROBIN NA FEMALE 43 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
NA NA NA UNK 0 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HOLLOMAN NA R FEMALE 100 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BOSTIAN NA NA FEMALE 0 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
JOLLY NA NA FEMALE 98 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
GRAHAM GARLAND NA SR MALE 74 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
KANTHI NA NA FEMALE 56 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
SCHAN NA NA FEMALE 35 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
STEWART-WOODS MARY O NA NA FEMALE 53 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BEST CHARLES RAY JR MALE 55 REMOVED REMOVED UNDER OLD PURGE LAW
KAAS EDWARD FRE MALE 76 REMOVED REMOVED UNDER OLD PURGE LAW
DAVENPORT H NA FEMALE 98 REMOVED REMOVED UNDER OLD PURGE LAW
BEAUDION JOHN NA MALE 50 REMOVED REMOVED UNDER OLD PURGE LAW
GRAHAM JOHN NA MALE 85 REMOVED REMOVED UNDER OLD PURGE LAW
GUNTER LEE KLEIN NA FEMALE 43 REMOVED REMOVED UNDER OLD PURGE LAW
DANIELS MARION NA MALE 90 REMOVED REMOVED UNDER OLD PURGE LAW
WOOD NICOLE M NA FEMALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
BORIS ROBERT NA MALE 63 REMOVED REMOVED UNDER OLD PURGE LAW
MOOREFIELD ROBERT STA MALE 49 REMOVED REMOVED UNDER OLD PURGE LAW
JORDAN TERRA NA FEMALE 32 REMOVED REMOVED UNDER OLD PURGE LAW
D’AIGNEAU TRACY ANN FEMALE 37 REMOVED REMOVED UNDER OLD PURGE LAW
NCT IS WRONG. SENT NA NA FEMALE 13 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
MV 5/17/95 NA NA MALE 89 REMOVED REMOVED UNDER OLD PURGE LAW
PRINCE ALICE KAY NA NA FEMALE 50 REMOVED REMOVED UNDER OLD PURGE LAW
LSEWHERE I NA NA FEMALE 39 REMOVED REMOVED UNDER OLD PURGE LAW
MILES IRENE K NA NA FEMALE 100 REMOVED REMOVED UNDER OLD PURGE LAW
CARROLL NA NA FEMALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
STEPHENS JEFFRYN G NA NA FEMALE 61 REMOVED REMOVED UNDER OLD PURGE LAW
HENDERSON RAY MICH NA NA MALE 53 REMOVED REMOVED UNDER OLD PURGE LAW
MENENDEZ-ZALACAIN NA NA FEMALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
LASSITE NA NA MALE 54 REMOVED REMOVED UNDER OLD PURGE LAW
MILLER JOHN KNOX NA NA MALE 83 REMOVED REMOVED UNDER OLD PURGE LAW
DEL ROSSO FRANCES NA NA FEMALE 81 REMOVED REMOVED UNDER OLD PURGE LAW
MEEKER MICHAEL GAI NA NA MALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
PRICE INEZ KEETER NA NA FEMALE 72 REMOVED REMOVED UNDER OLD PURGE LAW
RUTT CHARLES E NA NA MALE 64 REMOVED REMOVED UNDER OLD PURGE LAW
SUNDSTROM MARY BRE NA NA FEMALE 55 REMOVED REMOVED UNDER OLD PURGE LAW
PENDERGRAPH ADA W NA NA FEMALE 105 REMOVED REMOVED UNDER OLD PURGE LAW
WILKINS TERESA ELL NA NA FEMALE 50 REMOVED REMOVED UNDER OLD PURGE LAW
FERRETTIJ THOMAS A NA NA MALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
BRADSHAW NA NA MALE 49 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
XXX NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
X NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NEW TEST NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
ARRINGTON JULI NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA 08 UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
N NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
0 NA NA FEMALE 0 REMOVED REMOVED UNDER OLD PURGE LAW
NA NA NA UNK 0 REMOVED REMOVED UNDER OLD PURGE LAW
  • Most are REMOVED, but some are ACTIVE and VERIFIED. That suggests the data entry for this record is done after verification.
  • Some appear to have the first name in the middle name field, e.g. (F M L) ("" “BRENDA” “PARRISH”), ("" “ALEXIS” “BULLARD”)
  • Some appear to have first and middle names appended to the last name, e.g. (F M L) ("" "" “JONES LARRY MALLOR”), ("" "" “AMATO,KATHERINE,M”)
  • Some are missing all the names!
  • Some appear to be test data, e.g. last name = XXX or “NEW TEST”

There are very few records missing first name or last name, and most of them are REMOVED status. The easiest thing to do is just get rid of those records.

Exclude records with missing first or last name

3.6.2 Check for lower-case letters.

d %>% dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "[a-z]"))
# A tibble: 50 x 1
   last_name
   <chr>    
 1 McCLURE  
 2 McCULLLEY
 3 DeNOON   
 4 DeSIMON  
 5 DeSIMON  
 6 DeVANE   
 7 DeVANE   
 8 LeMASTER 
 9 MaCDONELL
10 MaCDONELL
# … with 40 more rows
d %>% dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "[a-z]"))
# A tibble: 24 x 1
   first_name
   <chr>     
 1 JoANN     
 2 LaVERNE   
 3 BettyJEAN 
 4 JoANNE    
 5 LaWANDA   
 6 LaVAN     
 7 JoANN     
 8 LaDORA    
 9 JoANN     
10 SiROBERT  
# … with 14 more rows
d %>% dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "[a-z]"))
# A tibble: 169 x 1
   midl_name
   <chr>    
 1 McBRIDE  
 2 McBRIDE  
 3 McCLENNY 
 4 McLEAN   
 5 LaVERNE  
 6 McCLEASE 
 7 McDAY    
 8 McCOLLUM 
 9 McKINNIE 
10 McLAWHORN
# … with 159 more rows
  • 243 names with lower case letters.
  • Occur in last, first, and middle names.
  • Associated with particles where there would optionally be a space, e.g. De VANE.

Map all letters to upper case

3.6.3 Digits

3.6.3.1 Check for digits

d %>% dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "[0-9]"))
# A tibble: 90 x 1
   last_name   
   <chr>       
 1 HOLLERS  111
 2 GALL0WAY    
 3 MV 5/17/95  
 4 01          
 5 YARBOR0     
 6 J0HNSON     
 7 LEAK 111    
 8 BURT0N      
 9 REYN0LDS    
10 4MCMANUS    
# … with 80 more rows
d %>% dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "[0-9]"))
# A tibble: 81 x 1
   first_name
   <chr>     
 1 HERM0N    
 2 BL0SSIE   
 3 J0HN      
 4 J0HNNY    
 5 MAJ0R     
 6 J0NATHAN  
 7 J0SEPH    
 8 L0RI      
 9 LEPOLE0N  
10 J0 ELLEN  
# … with 71 more rows
d %>% dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "[0-9]"))
# A tibble: 299 x 1
   midl_name      
   <chr>          
 1 MIZELLE25248249
 2 0VERTON        
 3 111            
 4 RAY 1.         
 5 0DELL          
 6 OLLIE 111      
 7 ARGUS 4TH      
 8 3RD.           
 9 LYN451         
10 JAMES 111      
# … with 289 more rows
  • Zero substituted for O, e.g. J0HNSON, BURT0N
  • Some are obviously generation suffixes, e.g. ARGUS 4TH, LEAK 111 (should be LEAK III)
  • Some are poor parsing into fields, e.g. MV 5/17/95 , MIZELLE25248249

Look at the digits individually.

3.6.3.2 Check for zero

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "0"))
dim(x)
[1] 67  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
 [1] "0"                 "0000000072294"     "01"               
 [4] "0WENS"             "ALEM0N"            "AN0Y0"            
 [7] "BISH0P"            "BOLAD0"            "BURT0N"           
[10] "C0NNOR"            "C0STNER"           "CAPUT0"           
[13] "CAUSIEESTK0-LEE"   "CL0NTZ"            "CLEMM0NS"         
[16] "CONN0R"            "CONR0Y"            "CR0NE"            
[19] "D0LLARS"           "D0WNS"             "DAWY0T"           
[22] "DIVINCENZ0"        "EAT0N"             "ESC0BEDO"         
[25] "FERGUS0N"          "FERNANDEZ-BRAV0"   "GALL0WAY"         
[28] "GOM0"              "GUARDAD0"          "HIGUER0-JAMES"    
[31] "J0HNSON"           "JOHNS0N"           "JORDAN-R0BERTS"   
[34] "KEAT0N"            "KOCH0NEAL"         "KONI0R"           
[37] "L0CKLEAR"          "MCC0Y"             "MCD0UGAL"         
[40] "ND0H"              "OCONN0R"           "P0RTER"           
[43] "P0WERS"            "PEREZ-NAVARR0"     "PULL0"            
[46] "R0CCANOVA"         "R0CCO"             "R0DRIGUEZ"        
[49] "REYN0LDS"          "ROSK0S-SHAMBERGER" "RUSS0"            
[52] "SAMARG0"           "SCAMARD0"          "SIMPS0N"          
[55] "SOLTER0"           "SOOTO0"            "ST0LTZ"           
[58] "TANHEHC0"          "TAYL0R"            "THOMPS0N"         
[61] "WINST0N"           "WIT0SKY"           "WO0DARD"          
[64] "YARBOR0"           "YATSK0"           
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "0"))
dim(x)
[1] 73  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
 [1] "0"           "ALLIS0N"     "ALONZ0"      "ANDREA-0"    "ANTONI0"    
 [6] "AZAVI0US"    "B0BBY"       "B0NNIE"      "B0YCE"       "BL0SSIE"    
[11] "C0LBY"       "C0RDELIA"    "CAR0LE"      "CAR0LYN"     "CHERYL0N"   
[16] "CHRIST0PHER" "D0LORES"     "D0NNA"       "DELI0"       "DONNA CAR0" 
[21] "DOR0THY"     "GREG0RY"     "HERM0N"      "J0"          "J0 ANN"     
[26] "J0 ELLEN"    "J0AN"        "J0HN"        "J0HNNY"      "J0NATHAN"   
[31] "J0SEPH"      "JONATH0N"    "K0LTON"      "KAR0N"       "L0RI"       
[36] "L0UIZETTA"   "LEPOLE0N"    "M0NICA"      "M0NIKA"      "MAJ0R"      
[41] "MARI0N"      "MARY-J0"     "MICHAEL TR0" "NAT0SHA"     "ORLAND0"    
[46] "OTH0"        "P0LLY"       "PLACID0"     "R0BERT"      "R0Y"        
[51] "REYNALD0"    "RODRIG0"     "S0NTE"       "SHANN0N"     "T0NYA"      
[56] "TIM0THY"     "V0NCIEAL"    "Y0LANDA"    
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "0"))
dim(x)
[1] 130   1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "0"               "0  CYRUS"        "0'BRIAN"         "0'CONNOR"       
 [5] "0ATES"           "0DEL"            "0DELL"           "0MAE"           
 [9] "0ROURKE"         "0VERTON"         "10052004"        "103"            
[13] "205"             "2205"            "8017"            "ALEXANDER080572"
[17] "ALPHONS0"        "ANDERSON9104576" "ANN B0YD"        "ANTH0NY"        
[21] "AY0"             "BA0-KUO"         "C1010"           "CO0PER"         
[25] "COL0N"           "CR0XIN"          "D0N"             "D0RIS"          
[29] "D0UGLAS"         "DALE401"         "DEANGEL0"        "DEV0NA"         
[33] "DIO0NE"          "DON0HOO"         "EDWARDS1801"     "ELAINE1000"     
[37] "ELLI0TT"         "EMETRIC0"        "EN0"             "F0REST"         
[41] "FINLEY500 SU"    "FRANT0NIO"       "H0USTON"         "J0"             
[45] "J0 MARINOVIC"    "J0E"             "J0HN"            "J0NES"          
[49] "JONATH0N"        "JOYCE701"        "JUNI0R"          "L0CKAMY"        
[53] "L0UISE"          "LAM0ND"          "LAT0NYA"         "LAV0NE"         
[57] "LE0N"            "LEE3708"         "LORENZ0"         "LOUIS7100"      
[61] "LY0NS"           "LYNN1820"        "M00RE"           "M0NGE"          
[65] "M0NIQUE"         "M0RALES"         "MARIE103062"     "NICH0LE"        
[69] "NICH0LS"         "OCONN0R"         "ORLAND0"         "P0RTER"         
[73] "PESATUR0"        "R0BERT"          "R0CHELLE"        "R0DGERS"        
[77] "R0Y"             "ROBINS0N"        "ROSENBAUM3305"   "RUNY0N"         
[81] "SAMBRAN0"        "SC0TT"           "SCOTT3450"       "SH0RROD"        
[85] "T0DD"            "T0NY"            "TAYL0R"          "TH0MPSON"       
[89] "TOME0"           "V0SS"            "VALENTIN0"       "W00LARD"        
[93] "WAYNE030986"     "WRIGHT2106"      "Y0LONDA"         "Y0UNG"          
  • 270 names with zero
  • Occur in last, first, and middle names.
  • Most are zero substituted for O, e.g. J0HNSON, BURT0N
  • Some are pure numeric, e.g. 0, 01
  • Some are names with concatenated numeric, e.g. WAYNE030986, WRIGHT2106

Map zero to O if name contains at least one letter and no digits 1-9

3.6.3.3 Check for one

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "1"))
dim(x)
[1] 20  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
 [1] "01"              "1"               "491715"          "971"            
 [5] "CARR  111"       "CHASTAIN 11"     "CLARK 111"       "COMER 111"      
 [9] "COX  1V"         "HINES 111"       "HOLLERS  111"    "LATTA 111"      
[13] "LEAK 111"        "MELTON 111"      "MV 5/17/95"      "PEELE 11"       
[17] "SATTERFIELD 111" "SPATCHER 111"    "TUCKER  11"      "WASHINGTON 111" 
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "1"))
dim(x)
[1] 3 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
[1] "DAVID 111" "ELIZABE1H" "ROSE1"    
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "1"))
dim(x)
[1] 163   1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "10052004"        "103"             "11"              "111"            
 [5] "1V"              "8017"            "A 111"           "ANDERSON9104576"
 [9] "ANN155"          "B 11"            "B 111"           "C 111"          
[13] "C1010"           "D 11"            "DALE401"         "EDWARDS1801"    
[17] "ELAINE1000"      "EUGENE 11"       "FRANCIS 11"      "FRANKLIN 1V"    
[21] "H 11"            "H 111"           "HODGES 111"      "HOUSTON 11"     
[25] "HOYLE 111"       "J1-TO"           "JAMES 111"       "JONA1"          
[29] "JOYCE701"        "LOUIS7100"       "LYN451"          "LYNN1820"       
[33] "LYNN2513"        "M 111"           "M1"              "MARIE103062"    
[37] "MARION 111"      "MASON 111"       "MICHAEL146"      "N 111"          
[41] "NADINE DOUGLAS1" "OLLIE 111"       "RANDOLPH 111"    "RAY 1."         
[45] "ROYAL 111"       "T 111"           "THOMAS 111"      "VERNON 111"     
[49] "W 111"           "WILLIAM 11"      "WILLIAM 111"     "WILLIAM1"       
[53] "WM 111"          "WRIGHT2106"     
  • 186 names with one
  • Occur in last, first, and middle names.
  • Most are 1 substituted for I in generation suffix, e.g. COX 1V, CARR 111
  • Some are pure numeric, e.g. 01, 971
  • Some are wrongly parsed, e.g. MV 5/17/95
  • Some are names with concatenated numeric, e.g. LYNN2513, WRIGHT2106

Delete generation suffixes where possible

3.6.3.4 Check for two

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "2"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "0000000072294"
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "2"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
[1] "MICHAEL DEAN 2"
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "2"))
dim(x)
[1] 13  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "10052004"        "205"             "2205"            "328"            
 [5] "4625"            "4932"            "ALEXANDER080572" "B2957"          
 [9] "LYNN1820"        "LYNN2513"        "MARIE103062"     "MIZELLE25248249"
[13] "WRIGHT2106"     
  • 15 names with two
  • Some are pure numeric, e.g. 205, 328
  • Some are names with concatenated numeric, e.g. LYNN1820, WRIGHT2106

3.6.3.5 Check for three

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "3"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "3"
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "3"))
dim(x)
[1] 0 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
character(0)
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "3"))
dim(x)
[1] 13  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "103"           "328"           "3RD."          "4932"         
 [5] "LEE3708"       "LYNN2513"      "MACK 3RD"      "MARIE103062"  
 [9] "MITCHELL368"   "ROSENBAUM3305" "SANFORD-3"     "SCOTT3450"    
[13] "WAYNE030986"  
  • 14 names with three
  • Some are pure numeric, e.g. 103, 328
  • Some are generation suffixes, e.g. 3RD., MACK 3RD
  • Some are names with concatenated numeric, e.g. LEE3708, SCOTT3450

3.6.3.6 Check for four

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "4"))
dim(x)
[1] 3 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "0000000072294" "491715"        "4MCMANUS"     
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "4"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
[1] "FR4ANK"
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "4"))
dim(x)
[1] 15  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "10052004"        "4625"            "4932"            "ANDERSON9104576"
 [5] "ANN BURTON47"    "ARGUS 4TH"       "DALE401"         "JAM4S"          
 [9] "LYN451"          "MCREE 4"         "MICHA4EL"        "MICHAEL146"     
[13] "MIZELLE25248249" "SCOTT3450"       "TE4S"           
  • 19 names with four
  • Some are pure numeric, e.g. 4625, 4932
  • Some are generation suffixes, e.g. ARGUS 4TH, MCREE 4
  • Some are names with concatenated numeric, e.g. DALE401, SCOTT3450
  • Some are intrusions in names, e.g. FR4ANK, MICHA4EL

3.6.3.7 Check for five

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "5"))
dim(x)
[1] 3 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "491715"     "ALBER5TSON" "MV 5/17/95"
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "5"))
dim(x)
[1] 0 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
character(0)
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "5"))
dim(x)
[1] 17  1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
 [1] "(NMN)5TH"        "10052004"        "205"             "2205"           
 [5] "4625"            "ALEXANDER080572" "ANDERSON9104576" "ANN155"         
 [9] "B2957"           "FINLEY500 SU"    "LUTHER5"         "LYN451"         
[13] "LYNN2513"        "MIZELLE25248249" "ROSENBAUM3305"   "SCOTT3450"      
[17] "W5RAY"          
  • 20 names with five
  • Some are pure numeric, e.g. 205, 2205
  • Some are generation suffixes, e.g. (NMN)5TH
  • Some are names with concatenated numeric, e.g. DALE401, SCOTT3450
  • Some are wrongly parsed, e.g. MV 5/17/95
  • Some are intrusions in names, e.g. FR4ANK, MICHA4EL
  • Some are substitution of 5 for S e.g. ALBER5TSON

3.6.3.8 Check for six

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "6"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "6"
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "6"))
dim(x)
[1] 1 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
[1] "RETT6A"
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "6"))
dim(x)
[1] 7 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
[1] "4625"            "ANDERSON9104576" "MARIE103062"     "MICHAEL146"     
[5] "MITCHELL368"     "WAYNE030986"     "WRIGHT2106"     
  • 9 names with six
  • Some are pure numeric, e.g. 6, 4625
  • Some are names with concatenated nmeric, e.g. MICHAEL146, MICHAEL146

3.6.3.9 Check for seven

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "7"))
dim(x)
[1] 4 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "0000000072294" "491715"        "971"           "MV 5/17/95"   
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "7"))
dim(x)
[1] 0 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
character(0)
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "7"))
dim(x)
[1] 8 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
[1] "8017"            "ALEXANDER080572" "ANDERSON9104576" "ANN BURTON47"   
[5] "B2957"           "JOYCE701"        "LEE3708"         "LOUIS7100"      
  • 12 names with seven
  • Some are pure numeric, e.g. 491715, 971
  • Some are names with concatenated numeric, e.g. DALE401, SCOTT3450
  • Some are wrongly parsed, e.g. MV 5/17/95
  • Some are intrusions in names, e.g. JOYCE701, LOUIS7100

3.6.3.10 Check for eight

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "8"))
dim(x)
[1] 0 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
character(0)
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "8"))
dim(x)
[1] 2 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
[1] "BEA LOUI8" "J8IMMIE"  
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "8"))
dim(x)
[1] 9 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
[1] "328"             "8017"            "ALEXANDER080572" "EDWARDS1801"    
[5] "LEE3708"         "LYNN1820"        "MITCHELL368"     "MIZELLE25248249"
[9] "WAYNE030986"    
  • 11 names with eight
  • Some are pure numeric, e.g. 328, 8017
  • Some are names with concatenated numeric, e.g. LEE3708, LYNN1820
  • Some are intrusions in names, e.g. J8IMMIE
  • Some might be substitution of 8 for SE e.g. BEA LOUI8

3.6.3.11 Check for nine

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "9"))
dim(x)
[1] 4 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
[1] "0000000072294" "491715"        "971"           "MV 5/17/95"   
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "9"))
dim(x)
[1] 0 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
character(0)
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "9"))
dim(x)
[1] 6 1
x %>%   
  dplyr::distinct() %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
[1] "4932"            "ANDERSON9104576" "B2957"           "LO9UIS"         
[5] "MIZELLE25248249" "WAYNE030986"    
  • 10 names with nine
  • Some are pure numeric, e.g. 971, 4932
  • Some are names with concatenated numeric, e.g. ANDERSON9104576, WAYNE030986
  • Some are wrongly parsed, e.g. MV 5/17/95
  • Some are intrusions in names, e.g. LO9UIS

3.6.3.12 Digit summary

Map zero to O if there are any letters and no digits 1-9

One is sometimes substituted for “I” in generation suffixes. Remove these suffixes from names.

Otherwise, map all digits to empty string.

3.6.4 Characters

3.6.4.1 Check for hyphens

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "-"))
dim(x)
[1] 34325     1
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
  [1] "AB-HUGH"           "AB-HUGH"           "ABDUL-GHAFFAR"    
  [4] "ABDUL-GHAFFAR"     "ABDUL-KARRIEM"     "ABDUL-RABB"       
  [7] "ABDUL-RAHIM"       "ABDUL-RAHIN"       "ABDUL-RAHMAN"     
 [10] "ABDUL-SALAAM"      "ABDUL-SALAM"       "ABDUL-WAHID"      
 [13] "ABDUR-RAHIM"       "ABDUR-RAHMAN"      "ABU-DAMES"        
 [16] "ABU-SABA"          "ABU-SABA"          "ABU-SABA"         
 [19] "ADAMS-CASKIE"      "ADAMS-MYERS"       "AFRICA-FLOYD"     
 [22] "AL-AWAR"           "AL-AWAR"           "AL-AWAR"          
 [25] "AL-KURDI"          "AL-SAADI"          "AL-SAADI"         
 [28] "ALBERT-KEULAN"     "ALSTON-EATMON"     "ANDERSON-TESH"    
 [31] "APPLEWHITE-LEWIS"  "ARDITO-BARLETTA"   "ARMSTRONG-VANN"   
 [34] "ARTHUR-CORNETT"    "ASKINS-MYRICK"     "AWTREY-KIRKMAN"   
 [37] "BAILEY-BROOKS"     "BARNARD-BAILEY"    "BENNETT-CLOWNEY"  
 [40] "BENTLEY-HALE"      "BIBB-FREEMAN"      "BLAKE-HASKINS"    
 [43] "BLEKFELD-SZTRAKY"  "BLEVINS-SPRINKLE"  "BLUE-SWANN"       
 [46] "BRADY-WILSON"      "BROWN-CORNELIUS"   "BRUCE-ROSS"       
 [49] "BUCKLEY-MOORE"     "CLARK-BARKER"      "CLAUDIO-DIAZ"     
 [52] "CLAUDIO-DIAZ"      "CLAUDIO-DIAZ"      "CLAUDIO-DIAZ"     
 [55] "COLE-MORGAN"       "CROWELL-SMITH"     "DAVIS-BOYD"       
 [58] "DAVIS-PARKER"      "DAVIS-ROBINSON"    "DUFFER-LEECHFORD" 
 [61] "EATON-ALSTON"      "ELLIS-WALLACE"     "ENGEL-BAKER"      
 [64] "GILLIS-HENDELL"    "GORDON-WICKER"     "GREEN-HOLLEY"     
 [67] "GUPTA-THOMAS"      "HARGETT-LILLY"     "HIATT-CRIBBS"     
 [70] "JONES-ALEXANDER"   "JONES-SUTTON"      "KELLER-HULL"      
 [73] "KOSKI-PONTON"      "KUCERA-HOFFMANN"   "LAWS-GRIFFIN"     
 [76] "LEARY-SMITH"       "LIDE-GRANT"        "LITTON-MCKENZIE"  
 [79] "LOCKLEAR-CASEY"    "LOCKLEAR-CRABTREE" "MANESS-LITTLE"    
 [82] "MAYNOR-BOWEN"      "MILLS- KHARBAT"    "MURPHY-GRAY"      
 [85] "PARKER-LOWE"       "PARRA-ASH"         "POOLE-JENKINS"    
 [88] "POPISH-SMITH"      "RAY-LEAZER"        "REDFEARN- SHELTON"
 [91] "RIDDICK-HARRELL"   "RIVERA-MONTORO"    "SEVORES-AMMONS"   
 [94] "SORRELLS-COOPER"   "STEPHENS-HORTON"   "TIPTON- BARNARD"  
 [97] "TOMBLIN-WELLMAN"   "WALLIS-JOHNSON"    "WATKINS-AKERS"    
[100] "WHITAKER-LINDSAY" 
  • ~34k last names with hyphens
  • Look like legitimately hyphenated last names
  • Some hyphenated last names have an extra space, e.g. “TIPTON- BARNARD”
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "-"))
dim(x)
[1] 5298    1
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
  [1] "ABDU-JAMIA"      "ABDUL-RAHEEM"    "AMIE-EMILY"      "AMY-MARIE"      
  [5] "ANNE-MARIE"      "ANNIE-BERT"      "ANNIE-MARIE"     "AR-RASHIED"     
  [9] "BARBARA-ANN"     "BESSIE-RUTH"     "BILLIE-JOE"      "CHARLOTTE-ANN"  
 [13] "CHRISTI-JO"      "DONNA-F"         "E-LEARN"         "EASTER-MAE"     
 [17] "ELLER-WEASE"     "EMILY-GAY"       "EMMA-LEE"        "ETHEL-MAE"      
 [21] "EUPHUR-MAE"      "GLO-LINDA"       "GRACE-EVELYN"    "HATTIE-BELL"    
 [25] "HENRY-ETTA"      "IMO-JEAN"        "INGA-LISA"       "JA-NET"         
 [29] "JANE-TTE"        "JEAN-ANN"        "JO-ANN"          "JO-ANN"         
 [33] "JO-ANN"          "JO-ANN"          "JO-ANN"          "JO-ANNE"        
 [37] "JO-DEAN"         "JO-LYNN"         "JOHN-EDWARD"     "JON-MARK"       
 [41] "JOSEPHA-JUANITA" "JUDITH-ANN"      "KRIS-TINA"       "LA-RITA"        
 [45] "LO-ETTA"         "LORI-ANN"        "LORI-ANN"        "LOU-ANN"        
 [49] "LOU-ANNE"        "LU-ANN"          "LUE-MYRTLE"      "LULA-MAE"       
 [53] "MAE-BELLE"       "MAE-WILLIE"      "MAMIE-LEE"       "MAN-SHUN"       
 [57] "MARI-AN"         "MARI-MARTHA"     "MARY-AGNES"      "MARY-ANN"       
 [61] "MARY-CELESTE"    "MARY-E"          "MARY-ELLEN"      "MARY-JO"        
 [65] "MARY-KATHERINE"  "MARY-KELLAM"     "MARY-LIZZIE"     "MARY-LOUISE"    
 [69] "MARY-M"          "MARY-RUTH"       "MEI-HSUEH"       "OK-CHA"         
 [73] "PATRICIA-GAY"    "PATSY-DAWN"      "PORTER-C"        "RICHARD-OLIVE G"
 [77] "ROSA-BELLE"      "SALLIE-MAE"      "SALLY-MARIE"     "SARA-LATRIC"    
 [81] "SARAH-E"         "SHAE-LYNN"       "SHELIA-RENE"     "SHERRY-ANN"     
 [85] "SHIRLEY-JEA"     "SHIRLEY-MAE"     "SONJA-KAYE"      "STACY-LYNN"     
 [89] "STUART-MORGAN"   "SUE-ELLEN"       "TA-TANISHA"      "TAMELA-LYNN"    
 [93] "TESSIE-MAE"      "TINA-DIANNE"     "TONI-PAT"        "VITA-JOAN"      
 [97] "W M-MRS"         "WANDA-D"         "WANDA-SUE"       "WILLIE-P"       
  • ~5k first names with hyphens
  • Look like legitimately hyphenated first names
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "-"))
dim(x)
[1] 6304    1
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
  [1] "A - BERTHA"      "ADELL-CARTER"    "AL-MUHJA"        "ALICE-BROOKS"   
  [5] "ANN-MARIE"       "ANN-RICE"        "ANN-SCOTT"       "ANN-SIGMON"     
  [9] "ANNE-"           "ANNE-GIBSON"     "ANNE-SANDQUIST"  "ANNIE-J"        
 [13] "ARNESSIA-VENISS" "ARTHEA-HOKE"     "BEANE-BROWN"     "BING-YUEN"      
 [17] "BRICKHOUSE-"     "BRYANT-"         "CAROL-LAWS"      "CAROL-ODOM"     
 [21] "CAROLINE-BR"     "CARY-ELWEIS"     "CULPEPPER-MCNAY" "DE-RAY"         
 [25] "DEE-ANN"         "DENEASE-THO"     "DENISE--WINDE"   "DERRICK-PATRICK"
 [29] "DIANE-HENSLEY"   "DIANE-WEBSTER"   "DILLARD-COLLIER" "E-CLINTON"      
 [33] "EDITH-MORGAN"    "EDNA-RAMSEY"     "ELAINE-BROOKS"   "EMMA-DIXON"     
 [37] "F-CRIBB"         "GAIL-QUEEN"      "GLENN - RUTH"    "GWEN-WILSON"    
 [41] "H - JACK"        "IRENE-HIGGIN"    "JANET-HOUGH"     "JEAN-BANKS"     
 [45] "JEAN-HANEY"      "JEAN-TIPTON"     "JEAN-WILLIS"     "JEANNEANE-BRYSO"
 [49] "JENENE-FENDER"   "JO-FREEMAN"      "JO-HAWKINS"      "JO-MACE"        
 [53] "JOSEPH-LEE"      "KAREN-RIDDLE"    "KAY-BYERS"       "KAY-WORDEN"     
 [57] "L-LEWIS"         "LA-SHONDA"       "LA-TISHA"        "LA-VETTA"       
 [61] "LAHOCINSKY-C"    "LANETTE-JORDAN"  "LE-ANN"          "LEE-FOX"        
 [65] "LEIGH-HENSLE"    "LIDDY-SILVERS"   "LORETTA-FENDER"  "LOU-BALLEW"     
 [69] "LOUISE-LOWMAN"   "LU-BRUCE"        "LYNN-AMMONS"     "LYNN-BOONE"     
 [73] "LYNN-DEYTON"     "LYNN-HOPSON"     "MAE-LUCAS"       "MALIQK-MUHAM"   
 [77] "MARIE-HILEMO"    "MARIE-JONES"     "MARIE-KNESS"     "MARIE-ROBINSON" 
 [81] "MARY-ALLEN"      "MAUD-MANIS"      "MAY-OVERMYER"    "MICHELLE-BARTLE"
 [85] "MING-LI"         "PEAKE-STEPHENS"  "R-ALICE"         "REBECCA-ANN"    
 [89] "REE-NORTON"      "RENEE-BUCHANAN"  "RICHARD-LEE"     "RIDER-HALL"     
 [93] "RITA-MESSER"     "ROBERTS-BROWN"   "RUTH-WILSON"     "S-WOODBY"       
 [97] "SHIRL-LYNN"      "SYBIL-ADAMS"     "WILLIAM-DEMO"    "WILLIS-BRADSHER"
  • ~6k middle names with hyphens
  • Look like legitimately hyphenated middle names
  • Some hyphenated middle names have an extra space, e.g. “A - BERTHA”, “H - JACK”

I suspect that hyphenation is likely to be a bit unreliable in transcription.

Map hyphen to empty string

3.6.4.2 Check for slash

d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "/"))
# A tibble: 46 x 1
   last_name        
   <chr>            
 1 GARNER/MCGRAW    
 2 RHONEY/PETERS    
 3 MV 5/17/95       
 4 SIDI/HIDA        
 5 STUTLER/JAGGERS  
 6 MORRIS/BLOOM     
 7 BRINKLEY/BAGGS   
 8 RAMSEY/DOBERT    
 9 WATERS/CRUZ      
10 BRITTAIN/SPRINKLE
# … with 36 more rows
d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "/"))
# A tibble: 9 x 1
  first_name  
  <chr>       
1 MARY/LISA   
2 LINDA SUSAN/
3 MARVIN/HENRY
4 JU/WANE     
5 TINA /LEA   
6 BRENDA KAY/ 
7 LISA MARIE/ 
8 MARY SUSAN/ 
9 LISA/MELISSA
d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "/"))
# A tibble: 1,032 x 1
   midl_name   
   <chr>       
 1 ANNE/MORGAN 
 2 WILLIAM/MCKO
 3 F./MARTIN   
 4 LEANN/STYLES
 5 LEE/ DEBBY  
 6 LOUISE/MORRI
 7 BENGE/CRAIG 
 8 LEE/FALLS   
 9 SHELTON/DEW 
10 PEARL/CARR  
# … with 1,022 more rows
  • 46 last names with slash
  • One obviously badly parsed - MV 5/17/95
  • Remainder being used equivalently to hyphen or indicating former name, e.g. MORRIS/BLOOM, WATERS/CRUZ

Map slash to empty string

3.6.4.3 Check for underscore

d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(last_name, "_"))
# A tibble: 1 x 4
  last_name      first_name midl_name sex   
  <chr>          <chr>      <chr>     <chr> 
1 SOLARZ_VOJDANI JENNIFER   S         FEMALE
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(first_name, "_"))
# A tibble: 17 x 4
   last_name     first_name    midl_name sex   
   <chr>         <chr>         <chr>     <chr> 
 1 PINNELL       KEVIN_C       <NA>      MALE  
 2 FRANKENBERGER JENNIFER_L    <NA>      FEMALE
 3 SICARD        MICHAEL_W     <NA>      MALE  
 4 STRUNK        WENDY_ANNE    <NA>      FEMALE
 5 SCHWARTING    MICHAEL_EDWIN <NA>      MALE  
 6 AMICK         KRISTEN_      W         FEMALE
 7 O'HARA        KELLI__D      <NA>      FEMALE
 8 RICHARDS      DEAN_ALLEN    <NA>      MALE  
 9 SPILMAN       HEATHER_MARIE <NA>      FEMALE
10 HINSHAW       DEAN__ALAN    <NA>      MALE  
11 DAWSON        URSULA_M      <NA>      FEMALE
12 ROWE          DAVID_R       <NA>      MALE  
13 KRAUSS        REBECCA_REESE <NA>      FEMALE
14 MCKINNEY      MARY_B        <NA>      FEMALE
15 KENNEDY       ESSIE_B       <NA>      FEMALE
16 ALFORD        NICOLE_M      <NA>      FEMALE
17 WOODS         TE_KISHA      Y         FEMALE
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(midl_name, "_"))
# A tibble: 3 x 4
  last_name first_name midl_name    sex   
  <chr>     <chr>      <chr>        <chr> 
1 GAINES    DEBORAH    ARNETTE_     FEMALE
2 MOSS      REX        NICHOLAS_TUC MALE  
3 KILE      JONES      EDWARD_M     MALE  
  • 21 names with underscore
  • Being used equivalent to hyphen in one last name: SOLARZ_VOJDANI
  • Mostly used as equivalent of space in other names, e.g. KEVIN_C, DEAN__ALAN

Map underscore to empty string

3.6.4.4 Check for percent

d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(last_name, "%"))
# A tibble: 1 x 4
  last_name     first_name midl_name sex   
  <chr>         <chr>      <chr>     <chr> 
1 SCHERM%MARTIN WYATT      <NA>      FEMALE
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(first_name, "%"))
# A tibble: 4 x 4
  last_name first_name midl_name sex   
  <chr>     <chr>      <chr>     <chr> 
1 BENDING   ANN%       LAWTON    FEMALE
2 JOHNSON   P%         DONALD    MALE  
3 MACLEAN   DAV%       STUART    MALE  
4 JEFFERSON EVE%       <NA>      FEMALE
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(midl_name, "%"))
# A tibble: 0 x 4
# … with 4 variables: last_name <chr>, first_name <chr>, midl_name <chr>,
#   sex <chr>
  • 5 names with percent
  • Being used equivalent to hyphen in one last name: SCHERM%MARTIN
  • Possibly being used as substitute for E, e.g. DAV%, ANN%
  • Sometimes appears to be a pointless suffix, e.g. P%, EVE%

Map percent to empty string

3.6.4.5 Check for single quotes

x <- d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "'"))
dim(x)
[1] 9712    1
x %>%   
  dplyr::distinct() %>% 
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name) %>% 
  dplyr::pull(last_name)
  [1] "BOURR'E"           "BOVE'"             "D'ALPHE"          
  [4] "D'AMBROSIO"        "D'AMICO"           "D'ANGELO"         
  [7] "D'ANGIO"           "D'ANNUNZIO"        "D'ANTIGNAC"       
 [10] "D'ARCO"            "D'ARMOND"          "D'ARVILLE"        
 [13] "D'ASCOLI"          "D'AUGUSTA"         "D'AURIA"          
 [16] "D'AUTRECHY"        "D'AVANZO"          "D'EMPAIRE"        
 [19] "D'ERCOLE"          "D'HEMECOURT"       "D'IGNAZIO"        
 [22] "D'INDIA"           "D'ONOFRIO"         "D'SANT"           
 [25] "DEBELL-O'NEAL"     "DEL RE'"           "DELL'OSSO"        
 [28] "DUARTE'"           "L'ETOILE"          "L'HUILLIER"       
 [31] "LACHARITE'-OTWELL" "O' NEAL"           "O'BANION"         
 [34] "O'BANNON"          "O'BERRY"           "O'BRIAN"          
 [37] "O'BRIANT"          "O'BRIEN"           "O'BRYAN"          
 [40] "O'BRYANT"          "O'BRYON"           "O'BYRNE"          
 [43] "O'CARROLL"         "O'CONNEL"          "O'CONNELL"        
 [46] "O'CONNER"          "O'CONNOR"          "O'CONWELL"        
 [49] "O'DANIEL"          "O'DEA"             "O'DEAR"           
 [52] "O'DEAR BROOKS"     "O'DELL"            "O'DOM"            
 [55] "O'DONALD"          "O'DONNEL"          "O'DONNELL"        
 [58] "O'DRISCOLL"        "O'FARRELL"         "O'FERRELL"        
 [61] "O'GARA"            "O'GEARY"           "O'GRADY"          
 [64] "O'GUIN"            "O'GWYNN"           "O'HARA"           
 [67] "O'HERN"            "O'KANE"            "O'KEEFE"          
 [70] "O'KELLEY"          "O'KELLY"           "O'KONEK"          
 [73] "O'LAUGHLIN"        "O'LEARY"           "O'MAHONY"         
 [76] "O'MARA"            "O'NEAL"            "O'NEAL-BIGGS"     
 [79] "O'NEAL-CLEMENTS"   "O'NEAL-WRIGHT"     "O'NEIL"           
 [82] "O'NEILL"           "O'PHARROW"         "O'QUIN"           
 [85] "O'QUINN"           "O'REAR"            "O'REILLY"         
 [88] "O'RILEY"           "O'RORK"            "O'ROUKE"          
 [91] "O'ROURKE"          "O'SHAUGHNESSY"     "O'SHEA"           
 [94] "O'SHIELD"          "O'SHIELDS"         "O'STEEN"          
 [97] "O'SULLIVAN"        "O'TOOLE"           "O'TUEL"           
[100] "SOLLE'"           
x <- d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "'"))
dim(x)
[1] 1965    1
x %>%   
  dplyr::distinct() %>% 
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(first_name) %>% 
  dplyr::pull(first_name)
  [1] "ANDR'E"         "ANDRE'"         "ANDRE'A"        "ANDRE'DEVON"   
  [5] "AR'RECOZELL"    "B'LINDA"        "CE'MONIA"       "CHARLA'"       
  [9] "CHERYL RENE'"   "CHIMENE'"       "CHOUV'ON"       "CHRISTINE'"    
 [13] "D'ANDRE"        "D'ANDREA"       "D'ANNA"         "D'ANNE"        
 [17] "D'AYANA"        "D'CRAYTON"      "D'ETTA"         "D'ETTE"        
 [21] "D'JUAN"         "D'NISE"         "DA'QUON"        "DANTE'"        
 [25] "DE'ALLO"        "DE'ONDRA"       "DE'QUAN"        "DE'SHUN"       
 [29] "DE'VONNA"       "DEAN'NA"        "DENA'"          "DESIRE'"       
 [33] "DONTE'"         "EL'MIRA"        "EL'VERTA"       "ENDRE'"        
 [37] "HONORE'"        "J'MEKA"         "JA'COBIE"       "JA'NET"        
 [41] "JANA'"          "JE'CISKEN"      "JE'KEITH"       "JO'AN"         
 [45] "JOSE'"          "KA'AUNNE"       "KA'TINA"        "KIELEE'"       
 [49] "L'AMARI"        "L'CRISH"        "L'LENA"         "L'TANJA"       
 [53] "L'TANYA"        "L'TASHA"        "L'VON"          "LA'TISHA"      
 [57] "LE'RON"         "LE'TRINA"       "LU'SHELL"       "MARE'"         
 [61] "MARIA-JOSE'"    "MONCHE'"        "O'BERA"         "O'BERRY"       
 [65] "O'BRYANT"       "O'DELL"         "O'DELLA"        "O'DEYNE"       
 [69] "O'GENE"         "O'JAY"          "O'KEITHA"       "O'KELLY"       
 [73] "O'LEMA"         "O'NEAL"         "O'NEIL"         "O'NEILL"       
 [77] "O'NICA"         "O'NICHOLUS"     "O'RITA"         "O'TIKA"        
 [81] "R'DELL"         "RENA'"          "RENE'"          "RENEE'"        
 [85] "RENNA'"         "SA'MAAD"        "SADE'"          "SHA'RON"       
 [89] "SHANA'"         "SHANEE'"        "SHARON RE'NEE"  "SHAUN'DERRIC"  
 [93] "SHAWNTA'"       "SHELE'"         "SHEREA'"        "T'KISHA"       
 [97] "TA'AISHA"       "TA'MAIRA"       "TA'RE"          "VIVIAN 'VIKKI'"
x <- d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "'"))
dim(x)
[1] 5426    1
x %>%   
  dplyr::distinct() %>% 
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(midl_name) %>% 
  dplyr::pull(midl_name)
  [1] "AIME'"        "ANDRE'"       "ANSHAWNA'"    "BOVA'"        "BRECK'RIDG"  
  [6] "D'ABBRACCI"   "D'AGOSTINO"   "D'ANNA"       "D'EMILIO"     "D'NELL"      
 [11] "DA'NANT"      "DANTE'"       "DE'NANG"      "DE'QUAN"      "DEANDRE'"    
 [16] "DEE O'NEAL"   "DENE'"        "DENEE'"       "DU'WANN"      "HARDR'E"     
 [21] "JA'NELLE"     "JA'NET"       "JAU'CONNIE"   "JEANNE'"      "JENEE'"      
 [26] "JERNIQUE'"    "JOAN O'GRADY" "JOSE'"        "L'REE"        "L'VONNE"     
 [31] "LA'RONDA"     "LA'SHAUN  A"  "LA'TESE"      "LA'VETTE"     "LA'VONNE"    
 [36] "LE'SHON"      "LE'VELLE"     "LE'VONE"      "LEON'"        "MAR'CEL"     
 [41] "O'B."         "O'BERRY"      "O'BOYLE"      "O'BREIN"      "O'BRIAN"     
 [46] "O'BRIEN"      "O'BRIN"       "O'BRYAN"      "O'BRYANT"     "O'BRYHIM"    
 [51] "O'BRYON"      "O'CARROLL"    "O'CONNELL"    "O'CONNER"     "O'CONNOR"    
 [56] "O'DANIEL"     "O'DAY"        "O'DELL"       "O'DIEAR"      "O'GAIL"      
 [61] "O'GEARY"      "O'GRADY"      "O'HANLON"     "O'HARA"       "O'HARROLD"   
 [66] "O'KEITH"      "O'LERA"       "O'MALLEY"     "O'MARY"       "O'MAX"       
 [71] "O'MICHAEL"    "O'NEAL"       "O'NEATHA"     "O'NEIL"       "O'NEILL"     
 [76] "O'NETRUSE"    "O'NIEL"       "O'QUINN"      "O'REILLY"     "O'RILEY"     
 [81] "O'RONALD"     "O'SHEA"       "O'SHEAL"      "O'SHIELDS"    "O'STEEN"     
 [86] "O'TUEL"       "REN'EE"       "RENA'"        "RENE'"        "RENE'/WRIGHT"
 [91] "RENE'E"       "RENEE'"       "RENEE'/WILSO" "RENEE'BROWN"  "RUNEE'"      
 [96] "SHANTRE'"     "TRENNE'"      "U'TAY"        "VANAE'"       "VELINA'"     
  • ~17k names with single quotes
  • Most look like correct names, e.g. O’NEAL, D’AGOSTINO
  • Some have a terminal quote, e.g. BONE’, BOVA’
  • Some have extra space and/or a hyphen, e.g. “O’ NEAL”, “LACHARITE’-OTWELL”

I suspect that quotes are likely to be a bit unreliable in transcription.

Map single quote to empty string

3.6.4.6 Check for double quotes

d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "\""))
# A tibble: 1 x 1
  last_name
  <chr>    
1 "LA\"BEE"
d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "\""))
# A tibble: 4 x 1
  first_name       
  <chr>            
1 "HENRYL\""       
2 "MARY (\"PETE\")"
3 "\"C\""          
4 "GEMES \"BO\""   
d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "\""))
# A tibble: 19 x 1
   midl_name       
   <chr>           
 1 "\"ED\""        
 2 "\"YVONNE\""    
 3 "\"WALANIA\""   
 4 "\"DREW\""      
 5 "\"TATER\""     
 6 "\"JOHNNY\""    
 7 "\"HICK\""      
 8 "\"LOY\""       
 9 "CARL  \"PETE\""
10 "\"DIANE\""     
11 "R \"FRANCES\"" 
12 "\"CECIL\""     
13 "\"SCOTT\""     
14 "\"ALICIA\""    
15 "\"SHARON\""    
16 "W \"BETSY\""   
17 "\"NEEL\""      
18 "\"RENA\" NEWSO"
19 "ALLEN \"JAKE\""

The backslashes are automatically inserted escaping so that the output strings could be read as inputs without getting confused by the double quotes.

  • 24 names with double quotes
  • Some substitute double for single quote, e.g. the last name LA"BEE probably should have been LA’BEE
  • Some appear to be aliases or nicknames, e.g. MARY (“PETE”), “TATER”

Map all double quotes to empty string

3.6.4.7 Check for asterisk

d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "\\*"))
# A tibble: 7 x 1
  last_name
  <chr>    
1 O*TOOLE  
2 O*TOOLE  
3 O*NEAL   
4 O*MASTERS
5 D*AMICO  
6 D*AMICO  
7 O*BRIEN  
d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "\\*"))
# A tibble: 1 x 1
  first_name
  <chr>     
1 TOM*      
d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "\\*"))
# A tibble: 23 x 1
   midl_name
   <chr>    
 1 WAYNE*   
 2 WAYNE*   
 3 WAYNE*   
 4 DAVID*   
 5 DEAN*    
 6 BARE*    
 7 WAYNE*   
 8 RAY*     
 9 RANDALL* 
10 ALLEN*   
# … with 13 more rows
  • 31 names with asterisk
  • Asterisk substituted for single quote in last names, e.g. ONEALS, DAMICO
  • Asterisk used as a suffix in first and middle names

Map asterisk to empty string

3.6.4.8 Check for back-tick

d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(last_name, "`"))
# A tibble: 10 x 4
   last_name first_name midl_name sex   
   <chr>     <chr>      <chr>     <chr> 
 1 O`BRIANT  DIANE      JACKSON   FEMALE
 2 O`BRIANT  WILLIAM    TAYLOR    MALE  
 3 WOODARD`  JASON      WARREN    MALE  
 4 PUCKETT`  LEANDRA    DELANCE   FEMALE
 5 BRYANT`   WILLIAM    STEWART   MALE  
 6 GODWIN`   PATRICIA   YOUNG     FEMALE
 7 MORRISON` HAZEL      M         FEMALE
 8 BOYLES`   LINDA      BROWN     FEMALE
 9 HARRISON` TRACI      ANN       FEMALE
10 CASEY`    LONNIE     GREGORY   MALE  
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(first_name, "`"))
# A tibble: 71 x 4
   last_name first_name midl_name sex   
   <chr>     <chr>      <chr>     <chr> 
 1 MANGUM    RENEE`     GUPTON    FEMALE
 2 CAUSIN    `ROBERT    GALE      MALE  
 3 OVERTON   RENEE`     ANN       FEMALE
 4 BRADSHAW  RENEE`     LUFFMAN   FEMALE
 5 WALLAACE  ANNA`      <NA>      FEMALE
 6 YOUNG     MICHELLE`  <NA>      FEMALE
 7 HARRIS    LE`ANDRA   RACHELE   FEMALE
 8 BALLARD   `MARY      J         FEMALE
 9 MILTON    STEPHEN`   GLENN     MALE  
10 DICKEY    `BETTY     JANE      FEMALE
# … with 61 more rows
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(midl_name, "`"))
# A tibble: 33 x 4
   last_name first_name midl_name sex   
   <chr>     <chr>      <chr>     <chr> 
 1 COOPER    MAXINE     REN`E     FEMALE
 2 SLAUGHTER WILBUR     O`NEAL    MALE  
 3 HILLIARD  TIMOTHY    O`NEAL    MALE  
 4 WILLIAMS  RODNEY     O`NEAL    MALE  
 5 AXTELL    JENNIFER   SERRE`    FEMALE
 6 BRASWELL  DENNIS     O`NEAL    MALE  
 7 HUMPHRIES CHARLES    O`NEAL    MALE  
 8 SUTTLES   CONNIE     RENE`     FEMALE
 9 THORN     ANDREA     RENEE`    FEMALE
10 GASTER    LISA       REN`EE    FEMALE
# … with 23 more rows
  • 114 names with back-tick
  • Some back ticks are substituted for single quotes, e.g. OBRIANT, ONEAL
  • Others are affixes for no obvious reason, e.g. CASEY, ANNA

Map back-tick to empty string

3.6.4.9 Check for tilde

d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(last_name, "~"))
# A tibble: 1 x 4
  last_name      first_name midl_name sex   
  <chr>          <chr>      <chr>     <chr> 
1 O~CONNOR-LEWIS BELINDA    JOY       FEMALE
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(first_name, "~"))
# A tibble: 0 x 4
# … with 4 variables: last_name <chr>, first_name <chr>, midl_name <chr>,
#   sex <chr>
d %>% 
  dplyr::select(last_name, first_name, midl_name, sex) %>%
  dplyr::filter(stringr::str_detect(midl_name, "~"))
# A tibble: 0 x 4
# … with 4 variables: last_name <chr>, first_name <chr>, midl_name <chr>,
#   sex <chr>
  • 1 name with tilde
  • Being used equivalent to single quote, e.g. O~CONNOR

Map tilde to empty string

3.6.4.10 Check for whitespace.

x <- d %>% 
  dplyr::select(last_name, first_name, midl_name, sex, age) %>%
  dplyr::filter(stringr::str_detect(last_name, "\\s"))
dim(x)
[1] 13637     5
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name sex age
ABD SHAKUR AMAD NA MALE 53
ABD SHAKUR SADIYAH NA FEMALE 59
AL HUSSAINA IRENE K FEMALE 64
ARNOLD DEW JEANETTE LINDSEY FEMALE 44
BENDER JR JOHN JOHN P MALE 59
DA SILVA JOHN NUNES MALE 78
DA SILVA LISA MARIE FEMALE 34
DA SILVA OPAL JANETTE FEMALE 68
DE BRADY LEONARD DUTCH MALE 75
DEL MAURO DENNIS GERALD MALE 61
DEL ROSARIO ROLITO TUANO MALE 51
DES JARDINS BERNARD WILLIAM MALE 57
DI LORENZO JOSEPH PETER MALE 38
DU BOIS CHARLES LLEWELLYN MALE 68
HOLLERS 111 RUSSELL JOSEPH MALE 40
KROMIS BRESNIHAN JILL SUZANNE FEMALE 31
LA MOTTE JEANNEANE NA FEMALE 57
LAMBERT JR CARL GLEN MALE 72
LE BLANC MICHELLE ANNE FEMALE 35
LE FEVER HOYT T MALE 93
LE MAY SHELDON NA MALE 42
MAC CRINDLE CAMERON CALVIN MALE 69
MAC DONALD SARA R FEMALE 87
MAC DOWELL MIRIAM KUHN FEMALE 79
MAC DOWELL NORMAN MARTIN MALE 84
MC ANIFF JOHN THOMAS MALE 84
MC ANIFF PATRICIA GORDON FEMALE 82
MC CADEN ANNIE LEE FEMALE 86
MC CADEN BARBARA M FEMALE 67
MC CADEN JOHN HENRY MALE 67
MC CADEN MERLEN NA MALE 90
MC CADEN VIOLET NA FEMALE 54
MC CADEN WILLIE LEE MALE 91
MC CADEN WILSON CRAWFORD MALE 71
MC COY JAMES E MALE 84
MC COY JAMES EDWARD MALE 63
MC COY LETTIE B FEMALE 67
MC COY LUCILLE NA FEMALE 83
MC CRAY LINDA MARIE FEMALE 58
MC GARR HOWARD LUTHER MALE 94
MC GHEE ATALANTA B COUSINS FEMALE 105
MC GHEE BESSYE L FEMALE 74
MC GHEE DAVID GRIFFIN MALE 77
MC GUIRE DERYL A FEMALE 62
MC MANNEN MARY HARRIS FEMALE 45
MC MULLEN CHERYL AYSCUE FEMALE 56
MC NAIR FERNELL NA FEMALE 71
MCMILLIAN (MUMFO BETTY ANN FEMALE 55
MCQUEEN (MORRISE MARY LOUISE FEMALE 47
MILLS- KHARBAT TRACIE ROBBIN FEMALE 35
NCT IS WRONG. SENT NA NA FEMALE 13
O BRIEN VICTORIA W FEMALE 81
O HARA MARLENE BIBEY FEMALE 48
O NEAL MARY NEAL FEMALE 68
O NEAL PAUL BLAIR MALE 72
PARISH (RAMON) ROSE MARIE FEMALE 41
REDFEARN- SHELTON CHRISTY MICHELE FEMALE 35
ST CLAIR BONITA SMALLWOOD FEMALE 50
ST CLAIR JAMES W MALE 72
ST CLAIR JEAN M FEMALE 69
ST CLAIR JOYCE NA FEMALE 75
ST CLAIR LESLIE NA FEMALE 0
ST CLAIR PAMELA GRACE FEMALE 50
ST CLAIR RICHARD DAVID MALE 43
ST LOUIS PAMELA QUICK FEMALE 57
ST ONGE RAYMOND F MALE 57
ST PIERR RAYMOND THOMAS MALE 67
ST SING MARY GARDNER FEMALE 76
ST SING ROBERT EDGAR MALE 80
ST SING ROBIN LEE MALE 53
SYKES (BRICKHOUSE) ANTHONY E. FEMALE 73
TIPTON- BARNARD NANCY FAYE FEMALE 40
VAN BALEN RACHELLE M FEMALE 63
VAN BUSKIRK CHERYL ANN FEMALE 44
VAN DEVENTER GRETTA SHORT FEMALE 75
VAN DONSEL JONATHAN ROBERT MALE 29
VAN DORPE ELIZABETH FLUGGER FEMALE 39
VAN DYKE GLORIA JEAN FEMALE 65
VAN DYKE RUTH WILKERSON FEMALE 86
VAN ETTEN DAWN MICHELLE FEMALE 33
VAN HORN BESSIE M FEMALE 100
VAN HORN CRYSTAL M. FEMALE 43
VAN HORN DAVID LANTZ MALE 46
VAN HORN WALLACE NA MALE 69
VAN LOTON MICHAEL J MALE 58
VAN MEIR TERRY WAYNE MALE 42
VAN SCHOLK DOUGLAS RICK MALE 55
VAN SUTPHIN KATHY MASSEY FEMALE 56
VAN ZANDLE JOHN L MALE 87
VAN ZANDLE ROSELYN H FEMALE 82
VANDER STOKKER JUDITH C FEMALE 59
VON BIBERSTEIN CAROLYN BROOKS FEMALE 41
VON BIBERSTEIN CAROLYN LEWIS FEMALE 69
VON BIBERSTEIN RICHARD NA MALE 38
VON BIBERSTEIN RICHARD NA MALE 69
VON BIBERSTEIN SARAH ELIZABETH FEMALE 41
WATTS ST PIERREE MARSHA BEAN FEMALE 51
WHITFIELD KAY M NA NA FEMALE 79
YELLOW ROBE DAVID LEVI MALE 68
YELLOW ROBE SANNA L FEMALE 65
x <- d %>% 
  dplyr::select(last_name, first_name, midl_name, sex, age) %>%
  dplyr::filter(stringr::str_detect(first_name, "\\s"))
dim(x)
[1] 23789     5
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name sex age
ABBOTT JO ANN NA FEMALE 65
ABERCROMBIE JO ANN CREECH FEMALE 71
ABERNATHY JEAN EWART D FEMALE 75
ABERNATHY JO ANN B FEMALE 67
ABERNETHY MARY ETTA SHULL FEMALE 73
ABT D JEAN ADAMS FEMALE 80
ACKERMANN ROSE ELLEN BERNARD FEMALE 55
ADAMS JO ANN SATTERFIELD FEMALE 51
ADKINS H C NA MALE 78
ALBEA MARY ALICE HOLLIDAY FEMALE 72
ALEXANDER W L NA MALE 95
ALFORD MARY HAZEL F FEMALE 78
ALLEN ROBIN LYNNE ALLEY FEMALE 47
ALLEN SARA ELIZABETH PHILLIPS FEMALE 54
ALLISON JO ANN B FEMALE 45
ALSTON BETTY JO NA FEMALE 63
ALSTON JO ANNE R FEMALE 54
ALSTON T. YVONNE MORGAN FEMALE 49
ANDERSON CONNIE JO MITCHELL FEMALE 48
ANDERSON MON AREE NA FEMALE 71
ANTHONY IVY JEANNE S FEMALE 0
ARMSTRONG MARY ANN WALKER FEMALE 71
ARNETTE OLA MAE HAGAMAN FEMALE 54
ARTIS MARY ANN FARRIOR FEMALE 48
ASHBURN G E JACK MALE 63
ASKEW GRACE CAROLYN CONNER FEMALE 62
BAILEY WILLIE MAE SYKES FEMALE 70
BARKER LOU ANNE WILSON FEMALE 52
BECKER T JOHN F MALE 75
BOWLING CATHERINE J HUGHS FEMALE 51
BOYLES DONNA KAY MCMILLAN FEMALE 50
BRANCH JO NELL CORDER FEMALE 68
BRANTLEY WINNIE BELLE B FEMALE 101
BRICKHOUSE IDA VIRGINIA MCPHERSON FEMALE 80
BRICKHOUSE MRS CLAUD NA FEMALE 93
BROADWAY JO ANN HOLMES FEMALE 50
BROWN ALICE KATHRYN COURTURIER FEMALE 52
BROWN CARLA LYNETTE ALLEN FEMALE 42
BURNETTE BETTY JEAN P FEMALE 66
BURNETTE LILLIE MAE B FEMALE 82
CAGLE JOHN (JACK) F MALE 37
CANNADY PATTIE MAE A FEMALE 80
CARPENTER MINNIE BELL C FEMALE 92
CHAVIS JEAN ELLEN MAXWELL FEMALE 46
CHEATHAM ANNIE BELL NA FEMALE 94
CHURCH EDITH ANN ABSHER FEMALE 43
CLAYBORNE BARBARA ANN DANIEL FEMALE 46
COLLINS MARY {HOLLY} HOLLOWELL FEMALE 36
CONOLY GURTIE PEARL LEACH FEMALE 58
COOPER MRS JESSE R FEMALE 90
COULTER DORIS ANN EWING FEMALE 58
COX PATRICIA FAYE SLAUGHTER FEMALE 52
CRONE MARY ANN HELEN FEMALE 54
CURRIN LINDA GAIL HESTER FEMALE 55
DAVENPORT MRS H T FEMALE 90
DAVIES JAMES ALBERT JOHN MALE 60
DAVIS LOU ANN COX FEMALE 52
DEAN FLORA C ELLIS FEMALE 65
DIAL ANNER MARGARE HUNT FEMALE 72
DOLLYHIGH RUTH ALICE ATKINS FEMALE 53
DUDLEY LU ANN C FEMALE 51
EDWARDS BRENDA FAYE WILLIAMS FEMALE 46
EVERTON EDITH FRANCOI ALEXANDER FEMALE 70
FEREBEE TONIA YUVETTE BANKS FEMALE 36
FERRELL MARY JO P FEMALE 65
GAINER MAE ALICE E FEMALE 91
GARRETT BETTY JO M FEMALE 55
GASKILL MARTHA KAY PRESCOTT FEMALE 45
GENTRY MARY LEE T FEMALE 73
GIBBS MRS THEODORE C. FEMALE 84
GILLETTE JO ANN F FEMALE 45
GOLDSMITH LA MURIEL B FEMALE 64
GONZALES LEIGH ANNE LEWIS FEMALE 39
GOOCH MARY DIANE NA FEMALE 49
HARRIS MARY LANE GREEN FEMALE 56
HERSHBERGER CARL HENRY RONALD MALE 59
HOCUTT JO ANN MOODY FEMALE 60
HOLLY MARY LOU SMITH FEMALE 93
HORNE GLORIA J. DUNLAP FEMALE 44
HORTON JO ANNE CARDEN FEMALE 56
HORTON MYRTLE LEE WALKER FEMALE 84
ICENHOUR R L NA MALE 82
JACKSON SUSAN RAE FREEMAN FEMALE 35
KING ROSA LEONIA PLEDGER FEMALE 52
LAMBERT SHERRY G DAVIS FEMALE 45
LEE MELV IN RAY MALE 58
LOCKLEAR ELISA SUE BULLARD FEMALE 47
LOCKLEAR GEANIE ANN JACOBS FEMALE 51
LOCKLEAR GLORIA DALE CHAVIS FEMALE 45
MIXON ALLIE AMANDA ABBOTT FEMALE 36
MORRISON MARY SUSAN MCALLISTER FEMALE 48
NEEDHAM DEBORAH LYNN ALBRIGHT FEMALE 47
NORRIS MARY KAY COLLINS FEMALE 74
PLEDGER LASHON RAQUEL BAILEY FEMALE 31
PRUITT SANDRA GREY DEAN FEMALE 41
RUDD ROBIN SUE UNDERWOOD FEMALE 42
SHEPARD MONICA LEE ARTIS FEMALE 33
WADE BETTY JO HILL FEMALE 46
WILLIAMS VELMA LOUISE ALSTON FEMALE 49
WINSTON MYRA DIAN MCNEILL FEMALE 46
x <- d %>% 
  dplyr::select(last_name, first_name, midl_name, sex, age) %>%
  dplyr::filter(stringr::str_detect(midl_name, "\\s"))
dim(x)
[1] 74410     5
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name sex age
ABERNATHY BARBARA JEAN MILLER FEMALE 63
ABERNATHY SADIE MAE WALTON FEMALE 0
ABSHER ASA LEE MATHIS FEMALE 75
ACHESON JENNIFER DAWN EVANS FEMALE 27
ADAMS BETTY JO B FEMALE 70
ADAMS DONNA SUE HELMICK FEMALE 59
ADAMS NEEDHAM MC CLEESE MALE 62
ADAMS ROSA LEE S FEMALE 69
ADAMS RUSSELL A D MALE 87
ADKINS EMMALINE SUE GOODE FEMALE 52
ALBER MICHELE MARIE SUTTHOFF FEMALE 39
ALBRECHT FRED R K MALE 67
ALDRIDGE ESTHER MADGELINE H FEMALE 61
ALEXANDER PHYLLIS A HAYWOOD FEMALE 53
ALLEN MARY ELIZABETH AU FEMALE 79
ALLISON KAREN VANESSA EARL FEMALE 50
ALLISON MANTIE MIRIAM P FEMALE 85
AMMONS EVA MAE INMAN FEMALE 86
AMOS HATTIE BELL G FEMALE 89
AMOS MARY CATHERINE W MALE 80
ANDERSON DORTHY LEE G FEMALE 76
ANDERSON PRISCILLA M CLARK FEMALE 39
ANDERSON SARAH FRANCES A FEMALE 92
ANTHONY KATY LUCELE PITTM FEMALE 70
ARD AMY LORRAINE REY FEMALE 30
ARROWOOD LAURA SUE ALLEN FEMALE 69
BACON JOYCE ANN DOBIAS FEMALE 54
BAILEY BRENDA J BREVARD FEMALE 105
BAKER ALLIE BELLE P FEMALE 93
BAKER HAZEL IRENE HINSON FEMALE 80
BARGSLEY TERESA ANN VINES FEMALE 41
BARHAM BARBARA YVONNE FLOYD FEMALE 51
BARKSDALE MARGURITE J H FEMALE 85
BAUCOM BEULAH WILMA R FEMALE 87
BAUCOM GLORIA JANE WHITLEY FEMALE 54
BAUCOM JESSIE MAE RUSHING FEMALE 89
BECKNER SYLVIA TERRY RIVES FEMALE 68
BEMBURY LOU ELLA JONES FEMALE 78
BLACKWELDER MATTIE SUE S FEMALE 83
BLAND ANNIE MAE B FEMALE 84
BOSWELL KAY ELAINE AUSTI FEMALE 63
BRAYBOY ROY MAE B. FEMALE 66
BRIDGES ELEANOR P SHERWOOD FEMALE 88
BROOKS LINDA M HANEY FEMALE 60
BROOKS MARY RUTH T FEMALE 86
BROWN ELIZABETH DALE QUILL FEMALE 92
BUMPHUS CORA G ROYSTER FEMALE 74
BUNDY PATRICIA LYNN SMITH FEMALE 47
BUNN SARAH VIRGINIA WAL FEMALE 53
BURNETTE NORMA JEAN SETZER FEMALE 68
BURNETTE OLA MAE NOBLITT FEMALE 70
CAMP AUDIE MAE BYRD FEMALE 87
CARTER EDNA G K FEMALE 96
CHAMBERS DEBORAH JEAN BUCHANA FEMALE 47
CHEATHAM JOSEPH MC COY MALE 88
CHOCKLEY GEORGE MC ADAMS MALE 55
COLLIER MILDRED GRACE WILLIA FEMALE 78
COUGHENOUR ROBBIN L E FEMALE 47
COX MARY B CLARK FEMALE 42
CRISCO MARY LEE M FEMALE 83
DEAL ALMA JEAN ALDRIDG FEMALE 55
DUGGER NOLA ANITA MRS FEMALE 95
DUNLEVY LINDA RUTH BOWLES FEMALE 58
EDWARDS MELLISA ODELL T FEMALE 97
ELMORE THELMA CLEADIS ARNO FEMALE 77
FLAKES LACIA LA’SHAUN A FEMALE 33
GALLION MARGIE PAULINE MCGEE FEMALE 82
GRANT LUCY VIVIAN DAVIS FEMALE 68
GREENE DOROTHY LINDA SIMPSO FEMALE 59
GRESSLEY ELIZABETH INOGENE ECH FEMALE 81
GRUBB JO ELLEN SMITH FEMALE 54
HAMMER MELINDA PAIGE CARRIGAN FEMALE 37
HARRIS FRANCES ASBURY BARTL FEMALE 80
HENSON JOHNNY DWIGHT MR MALE 52
HICKS BILLY DEAN MR MALE 46
HINTON HAZEL L BOONE FEMALE 55
IMHOFF LISA D MS FEMALE 48
JAMES KIM L.ORUNE OLDS MALE 43
JOHNSON PAMELA S BIDDY FEMALE 0
JONES SHARON ANN MCNAIR FEMALE 51
KEEVER MELINDA SUE RICE FEMALE 39
KELLY THOMAS THADDEUS ELL MALE 39
KIVETT PATTY YORK BALDWIN FEMALE 75
LEE SALINA LISA MARIE FEMALE 37
LINCOLN OLETA GRIGGS BURGI FEMALE 87
MILLER KATRINA MICHELE RECTOR FEMALE 33
MOFFITT HATTIE NELL HENRY FEMALE 78
OWENBY ALBERTA MARIE BURNS FEMALE 60
PRESTON HELEN C K E FEMALE 46
PRUITT NELDA MURIEL ABEE FEMALE 72
RASH BEULAH M DUTY FEMALE 71
RICHARDS GRACIE E HERRIN FEMALE 62
RICHARDSON POLLY V RICHARDSON FEMALE 48
RICKS SUSAN ANN BOLICK FEMALE 71
RIDDLE KATHERINE DENISE B FEMALE 45
SHEETS DEBORAH CHARLENE CHU FEMALE 47
STREET REVONA ELAINE BIRCH FEMALE 50
WILEY DONNA SUZANNE HAYE FEMALE 51
WILLIAMS BRENDA DENISE BAUCO FEMALE 47
YOUNG BARBARA DIANNE ROGER FEMALE 58
  • 111,836 names with whitespace
  • Some whitespace is because of prefixes, e.g. DA SILVA, LE BLANC
  • Some whitespace is separating generation suffixes, e.g. BENDER JR, HOLLERS 111
  • Some whitespace is instead of a hyphen, e.g. KROMIS BRESNIHAN, WATTS ST PIERREE
  • Some whitespace is incorrectly inserted, e.g MILLS- KHARBAT, REDFEARN- SHELTON
  • Some whitespace is probably variable between people, e.g. MC CADEN, MC COY
  • Some whitespace separates alias/nick/former names, e.g. PARISH (RAMON), SYKES (BRICKHOUSE)
  • Some whitespace is instead of a single quote, e.g. O BRIEN, O NEAL
  • Some whitespace is separating an honorific title, e.g. MRS CLAUD, MRS THEODORE
  • Some whitespace is separating a final honorific title, e.g. D MS

Map whitespace to empty string

3.6.4.11 Check for period

x <- d %>% 
  dplyr::filter(stringr::str_detect(last_name, "\\."))
dim(x)
[1] 44  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
BINGHAM JR. AMES EDMOND NA MALE 32 ACTIVE VERIFIED
BRICE.MICHAEL ARTHUR NA NA NA MALE 37 REMOVED DUPLICATE
BURWELL JR. DANNY EDWARD NA MALE 63 ACTIVE LEGACY DATA
DAYE JR. JAMES NA JR MALE 31 ACTIVE VERIFIED
NCT IS WRONG. SENT NA NA NA FEMALE 13 REMOVED REMOVED UNDER OLD PURGE LAW
PENDLETON-S. JENNIFER KRISTIN NA FEMALE 35 INACTIVE CONFIRMATION NOT RETURNED
ROGERS,JR. DAVID J. NA MALE 73 REMOVED REMOVED UNDER OLD PURGE LAW
RUSSELL, JR. KERMITT PATRICK NA MALE 36 ACTIVE VERIFIED
SHIELDS. DIANE PAYNE NA FEMALE 56 REMOVED REMOVED UNDER OLD PURGE LAW
ST. CLAIR HARRY NEIL NA MALE 53 ACTIVE LEGACY DATA
ST. CLAIR HAZEL MAIE NA FEMALE 85 ACTIVE LEGACY DATA
ST. CLAIR JACK LEE NA MALE 54 ACTIVE VERIFIED
ST. CLAIR KAREN LIPKA NA FEMALE 56 REMOVED MOVED FROM COUNTY
ST. CLAIR KATHLEEN MAY NA FEMALE 76 INACTIVE CONFIRMATION NOT RETURNED
ST. CLAIR MOLLIE MCSWAIN NA FEMALE 87 ACTIVE LEGACY DATA
ST. CLAIR ROBERT BENJAMIN NA MALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
ST. CLAIR WALTER RAYMOND NA MALE 89 ACTIVE LEGACY DATA
ST. CLAIR JR JAMES JOSEPH NA MALE 78 INACTIVE CONFIRMATION NOT RETURNED
ST. CYR CANDICE NICOLE NA FEMALE 31 ACTIVE VERIFIED
ST. DENIS MICHAEL DAVID NA MALE 39 INACTIVE CONFIRMATION NOT RETURNED
ST. GEORGE LANDIS MEDDERS NA FEMALE 44 REMOVED REMOVED UNDER OLD PURGE LAW
ST. GEORGE MARTHA S NA FEMALE 86 ACTIVE VERIFIED
ST. GERMAIN AMY NA NA FEMALE 23 ACTIVE VERIFIED
ST. JOHN CONSTANCE LINDA NA FEMALE 83 REMOVED REMOVED UNDER OLD PURGE LAW
ST. JOHN JESSICA JO NA FEMALE 29 ACTIVE VERIFIED
ST. LAWRENCE ELIZABETH W NA FEMALE 76 ACTIVE VERIFIED
ST. LEGER MARIE K NA FEMALE 66 REMOVED REMOVED UNDER OLD PURGE LAW
ST. LUKE KRYSTIAN ISAIAH NA MALE 29 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ST. PIERRE MARION FOGELBAUM NA FEMALE 83 ACTIVE LEGACY DATA
ST. ROMAIN ANGIE CHRISTINA NA FEMALE 28 ACTIVE LEGACY DATA
ST. ROMAIN LUCKY JOE NA MALE 48 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ST. SAUVEUR JILL JULIE-ANN NA FEMALE 49 REMOVED REMOVED UNDER OLD PURGE LAW
ST. WINTER JOHN RANDALL NA MALE 43 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ST.CLAIR JENNIFER TALLY NA FEMALE 29 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ST.CLAIRE KEVIN WAYNE NA MALE 32 ACTIVE VERIFIED
ST.GEORGE BLANE STEPHEN NA MALE 43 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ST.GERMAINE ADOLPHUS BERNARD NA MALE 68 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
ST.HILAIRE ANN MARIE NA FEMALE 36 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
ST.JOHN JOANN DIMAGGIO NA FEMALE 47 ACTIVE VERIFIED
ST.LOUIS VICKIE ANN NA FEMALE 63 REMOVED REMOVED UNDER OLD PURGE LAW
ST.PIERRE KEITH JOSEPH NA MALE 33 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
TRIVETTE JR. GARY LEE NA MALE 33 ACTIVE LEGACY DATA
VALKENAAR . JAMES NA JR MALE 48 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
WILSON JR. DAVID RAY NA MALE 29 ACTIVE LEGACY DATA
x <- d %>% 
  dplyr::filter(stringr::str_detect(first_name, "\\."))
dim(x)
[1] 651   8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
ADAMS E. VANCE NA MALE 0 REMOVED REMOVED UNDER OLD PURGE LAW
AINSLEY J. (JULIUS) T.(THOMAS) NA MALE 65 ACTIVE VERIFIED
ALSTON T. YVONNE MORGAN NA FEMALE 49 ACTIVE LEGACY DATA
AMIDON R. LOUISE NA FEMALE 102 REMOVED DECEASED
ANDERSON B. J. NA FEMALE 81 REMOVED REMOVED UNDER OLD PURGE LAW
ATTKISSON J. M. JR MALE 99 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BAILEY H. COLEMAN JR MALE 57 REMOVED REMOVED UNDER OLD PURGE LAW
BEARD E. BLAIR FARROW NA FEMALE 41 ACTIVE LEGACY DATA
BOUCHER T. RENEE NA NA FEMALE 25 ACTIVE VERIFIED
BRATCHER G. B. NA MALE 58 REMOVED MOVED FROM COUNTY
BRIGGS N. GERTRUDE RHODES NA FEMALE 94 REMOVED DECEASED
BRODIE MICHAEL L. NA NA MALE 47 REMOVED FELONY CONVICTION
BROOKS W. HALL NA MALE 85 REMOVED DECEASED
BROWN A. S. SR MALE 81 REMOVED DECEASED
CAIN W. R. NA MALE 82 REMOVED DECEASED
CALLICUTT J.C. NA NA MALE 75 REMOVED DECEASED
CANNADY FANNIE B. CREWS NA FEMALE 81 REMOVED DECEASED
CARAWAN OTTIS, JR. NA NA MALE 85 ACTIVE LEGACY DATA
CARROLL R. BRUCE NA MALE 91 REMOVED REMOVED UNDER OLD PURGE LAW
CHAMPION W. DUKE NA MALE 92 REMOVED REMOVED UNDER OLD PURGE LAW
CHAPPELL M. B. NA MALE 90 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
CLAYTON H. LESLIE NA MALE 93 ACTIVE LEGACY DATA
COMSTOCK W. J. NA MALE 83 REMOVED DECEASED
COOPER C. D. NA MALE 89 REMOVED DECEASED
COOPER EDITH M. KITTRELL NA FEMALE 63 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
COTTEN H. LA DENA BANTA NA FEMALE 75 REMOVED DECEASED
COX W. A. NA MALE 70 ACTIVE LEGACY DATA
DAVIS J.B. NA NA MALE 78 REMOVED ADMINISTRATIVE
DAVIS W.J. NA NA MALE 111 REMOVED ADMINISTRATIVE
DICKERSON R. B. JR MALE 78 REMOVED DECEASED
DILLEHAY DULCIE P. ELLINGTON NA FEMALE 52 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCKERY J. M. NA MALE 43 ACTIVE VERIFIED
DUKES L. D. NA MALE 93 REMOVED MOVED FROM COUNTY
DUNCAN C. L. NA MALE 91 REMOVED DECEASED
DURHAM RUCINADA C. FIELDS NA FEMALE 39 ACTIVE LEGACY DATA
EDWARDS J. T. NA MALE 82 REMOVED DECEASED
ELLIS W. PRISCILLA C. NA FEMALE 43 REMOVED MOVED FROM STATE
ENGLAND P.W. NA NA MALE 82 ACTIVE VERIFIED
FINCH C. STEWART JR MALE 91 REMOVED DECEASED
FOSTER VIRGINIA M. JONES NA FEMALE 45 REMOVED FELONY CONVICTION
GIBBS CALEB, JR. NA NA MALE 80 ACTIVE LEGACY DATA
GUFFEY J. L. NA MALE 70 ACTIVE VERIFIED
HALL J. C. NA MALE 77 ACTIVE VERIFIED
HARTSELL A. EUGENE NA MALE 76 REMOVED DECEASED
HAYWOOD G. A JR MALE 79 ACTIVE VERIFIED
HEISKELL E. LORRAINE NA FEMALE 79 REMOVED MOVED FROM COUNTY
HEROLD I. NA NA FEMALE 72 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HICKS TAMMY L. NA NA FEMALE 42 REMOVED REMOVED UNDER OLD PURGE LAW
HODGE C. T. SR MALE 90 REMOVED DECEASED
HOPKINS G. L. NA MALE 97 REMOVED DECEASED
HORNE GLORIA J. DUNLAP NA FEMALE 44 ACTIVE CONFIRMATION PENDING
JOYNER O. ELIZABETH NA FEMALE 94 REMOVED MOVED FROM COUNTY
KEEN W. E. NA MALE 105 REMOVED DECEASED
LEADFORD J. B. NA MALE 63 REMOVED MOVED FROM COUNTY
LEARY R. S. NA MALE 91 REMOVED DECEASED
LEDFORD T. G. NA MALE 80 REMOVED DECEASED
LILLEY G. C. NA MALE 80 ACTIVE VERIFIED
LLOYD GERTRUDE D. BRUNSON NA FEMALE 60 REMOVED REMOVED UNDER OLD PURGE LAW
LOTT H. R. NA MALE 103 REMOVED REMOVED UNDER OLD PURGE LAW
MANEY B. T. NA MALE 113 REMOVED DECEASED
MARTIN C. M. NA MALE 85 REMOVED REMOVED UNDER OLD PURGE LAW
MCCALL R. J. NA MALE 60 REMOVED MOVED FROM COUNTY
MCCLEESE REV. MINNIE NA FEMALE 83 REMOVED DECEASED
MCKINNEY A.J NA NA MALE 93 REMOVED DECEASED
MCLEOD ANNETTE E. HALL NA FEMALE 52 ACTIVE VERIFIED
MCMILLAN L. C. NA MALE 74 ACTIVE LEGACY DATA
MICHELS G. E. NA MALE 87 REMOVED DECEASED
MOORE BOOKER T. WASHINGTON JR MALE 49 REMOVED REMOVED UNDER OLD PURGE LAW
MORRIS C. E. NA MALE 99 REMOVED DECEASED
PLEDGER J. MELVIN NA MALE 80 REMOVED DECEASED
PROCTOR J.D., NA JR FEMALE 76 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
RAWLES V. E. JR MALE 97 REMOVED DECEASED
RICE R.B. NA NA MALE 89 REMOVED DECEASED
ROUGHTON J. WARREN NA MALE 77 REMOVED DECEASED
SADLER H. L JR MALE 75 ACTIVE LEGACY DATA
SATTERWHITE J. FURMAN NA MALE 87 REMOVED DECEASED
SAUNDERS J.C. (MIKE) NA NA MALE 109 REMOVED REMOVED UNDER OLD PURGE LAW
SAWYER J. D. NA MALE 72 ACTIVE LEGACY DATA
SAWYER W. W. NA MALE 90 REMOVED DECEASED
SEALS L. B. NA MALE 83 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
SELLARS C. P. NA MALE 91 REMOVED DECEASED
SENTER MARY ELIZ. T. NA FEMALE 66 REMOVED REMOVED UNDER OLD PURGE LAW
SHEPARD J. W. NA MALE 47 ACTIVE VERIFIED
SHOAFE M.H. NA NA MALE 56 ACTIVE VERIFIED
SNEED H. H. NA MALE 97 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
SOLOMON TERESSA A. JONES NA FEMALE 0 REMOVED REMOVED UNDER OLD PURGE LAW
STEGALL T. E. NA MALE 98 REMOVED DECEASED
STEVENSON C. R. JR MALE 67 ACTIVE LEGACY DATA
SWAIN B. DUDLEY NA MALE 85 REMOVED DECEASED
SWAIN J. EDWARD NA MALE 95 REMOVED DECEASED
TATUM N.C. NA NA MALE 50 ACTIVE LEGACY DATA
TAYLOR LILLIE R. P. NA FEMALE 73 REMOVED DECEASED
THOMPSON J.D. NA NA MALE 78 REMOVED REMOVED UNDER OLD PURGE LAW
THORNTON EMMA L. COOPER NA FEMALE 66 ACTIVE LEGACY DATA
VOLIVA R. O (OKLEY) NA MALE 91 ACTIVE CONFIRMATION PENDING
WALKER J. W. NA MALE 99 REMOVED DECEASED
WATKINS HARRIETT B. DICKERSON NA FEMALE 45 ACTIVE LEGACY DATA
WHITNER D. A. NA MALE 76 ACTIVE VERIFIED
WHITT G. RANDALL NA MALE 43 ACTIVE LEGACY DATA
WINDLEY L. B. NA MALE 79 ACTIVE VERIFIED
x <- d %>% 
  dplyr::filter(stringr::str_detect(midl_name, "\\."))
dim(x)
[1] 9322    8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
ABBOTT ALINE P. NA FEMALE 93 REMOVED DECEASED
ABBOTT BARBARA M. NA FEMALE 65 ACTIVE LEGACY DATA
ABBOTT BETTY G. NA FEMALE 100 REMOVED DECEASED
ABBOTT DAVID B. NA MALE 46 ACTIVE LEGACY DATA
ABBOTT DORIS M. NA FEMALE 85 REMOVED DECEASED
ABBOTT DOROTHY M. NA FEMALE 62 ACTIVE LEGACY DATA
ABBOTT EDITH M. NA FEMALE 72 REMOVED DECEASED
ABBOTT MANOLIA S. NA FEMALE 94 ACTIVE LEGACY DATA
ABBOTT MARTHA W. NA FEMALE 66 ACTIVE LEGACY DATA
ABBOTT RACHEL H. NA FEMALE 81 REMOVED REMOVED UNDER OLD PURGE LAW
ABBOTT WILLIAM A. NA MALE 71 REMOVED DECEASED
ADAMS CLIFFORD C. NA MALE 88 REMOVED DECEASED
ADAMS EDITH B. NA FEMALE 91 REMOVED DECEASED
ADAMS EDITH P. NA FEMALE 84 REMOVED DECEASED
ADAMS GOLDIE W. NA FEMALE 92 REMOVED DECEASED
ADAMS HILDA E. NA FEMALE 85 REMOVED DECEASED
ADAMS JOHN C. JR MALE 57 ACTIVE VERIFIED
ADAMS JOHN C. SR MALE 92 REMOVED DECEASED
ADAMS SHEILA L. NA FEMALE 57 REMOVED REMOVED UNDER OLD PURGE LAW
ADAMS SUE R. NA FEMALE 85 INACTIVE CONFIRMATION NOT RETURNED
ADCOCK ELIZABETH Y. NA FEMALE 88 ACTIVE LEGACY DATA
ADCOX GLADYS R. NA FEMALE 78 REMOVED REMOVED UNDER OLD PURGE LAW
ALDRICH ESTHER E. NA FEMALE 75 ACTIVE LEGACY DATA
ALEXANDER ANDREW J. NA MALE 71 ACTIVE VERIFICATION PENDING
ALEXANDER EDNA I. NA FEMALE 70 ACTIVE VERIFICATION PENDING
ALEXANDER JOHN H. NA MALE 54 REMOVED REMOVED UNDER OLD PURGE LAW
ALEXANDER KERRY K. NA MALE 54 REMOVED REMOVED UNDER OLD PURGE LAW
ALEXANDER LAURA M. NA FEMALE 92 REMOVED DECEASED
ALEXANDER ROBERT T. NA MALE 51 REMOVED REMOVED UNDER OLD PURGE LAW
ALLEN ANNIE T. NA FEMALE 56 ACTIVE LEGACY DATA
ALLEN CRAIG A. NA MALE 41 ACTIVE LEGACY DATA
ALLEN LOIS H. NA FEMALE 73 ACTIVE LEGACY DATA
ALLEN ROSA B. NA FEMALE 80 REMOVED DECEASED
ALLEN SALLIE P. NA FEMALE 70 ACTIVE LEGACY DATA
ALSTON ANGELIA D. NA FEMALE 52 ACTIVE LEGACY DATA
ALSTON ANNA H. NA FEMALE 80 REMOVED DECEASED
AMARAL HERBERT V. NA MALE 71 ACTIVE LEGACY DATA
ANDERSON B. J. NA FEMALE 81 REMOVED REMOVED UNDER OLD PURGE LAW
ANDERSON JERRY J. NA MALE 60 REMOVED DECEASED
BACON BARBARA M. NA FEMALE 55 ACTIVE LEGACY DATA
BAILEY CORA V. NA FEMALE 62 ACTIVE VERIFICATION PENDING
BAILEY EFFIE R. NA FEMALE 56 ACTIVE VERIFICATION PENDING
BAILEY HORACE K. SR MALE 51 ACTIVE VERIFICATION PENDING
BAILEY NOLA M. NA FEMALE 89 REMOVED DECEASED
BAILEY THOMAS M. NA MALE 90 REMOVED DECEASED
BAIRD MINNIE M. NA FEMALE 93 REMOVED DECEASED
BARNES MARY S. NA FEMALE 84 REMOVED DECEASED
BARNES PATTIE D. NA FEMALE 106 REMOVED DECEASED
BENTHALL JEAN V. NA FEMALE 59 ACTIVE VERIFIED
BLACK CLARA M. NA FEMALE 72 ACTIVE VERIFIED
BLEVINS EARNEST G. NA MALE 81 ACTIVE LEGACY DATA
BOGUES LUTHER JR. NA MALE 82 REMOVED DECEASED
BOONE EVELYN S. NA FEMALE 81 ACTIVE CONFIRMATION PENDING
BRANCH PEARL B. NA FEMALE 90 ACTIVE VERIFIED
BRAYBOY ROY MAE B. NA FEMALE 66 ACTIVE VERIFIED
BREWER MYRTLE L. NA FEMALE 81 ACTIVE LEGACY DATA
BRICKHOUSE INDIA L. NA FEMALE 84 ACTIVE VERIFICATION PENDING
BROOKS RICHARD L. NA MALE 97 REMOVED MOVED FROM COUNTY
BROWN BIANCA M. NA FEMALE 71 ACTIVE LEGACY DATA
BRYANT DANIEL W. NA MALE 89 ACTIVE VERIFICATION PENDING
BRYANT FLORA R. NA FEMALE 51 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BRYANT FLOSSIE B. NA FEMALE 71 ACTIVE VERIFICATION PENDING
BRYANT SADIE L. NA FEMALE 70 ACTIVE VERIFICATION PENDING
BRYANT VITEORA J. NA FEMALE 89 REMOVED DECEASED
BRYANT WILLIAM H. NA MALE 92 REMOVED DECEASED
BUTTS JESSE P. NA MALE 62 ACTIVE LEGACY DATA
CABLE SAM A. NA MALE 74 ACTIVE VERIFIED
CAGLE SHIRLEY L C. NA FEMALE 47 REMOVED MOVED FROM COUNTY
CAHOON BEULAH C. NA FEMALE 73 ACTIVE LEGACY DATA
CAHOON JULIA J. NA FEMALE 62 ACTIVE VERIFIED
CAHOON LENORA C. NA FEMALE 95 REMOVED DECEASED
CORLEY STELLA H. NA FEMALE 53 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
CRAWLEY JUNIUS W. NA MALE 86 ACTIVE LEGACY DATA
CRAWLEY LUCILE F. NA FEMALE 90 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
CRAWLEY ROBERT E. NA MALE 71 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
CUTHRELL WILLIAM A. NA MALE 84 ACTIVE LEGACY DATA
DAVIS CHRISTEEN C. NA FEMALE 88 REMOVED DECEASED
DAVIS HAMILTON E. SR MALE 92 REMOVED DECEASED
DAVIS HAMILTON E. JR MALE 62 ACTIVE VERIFIED
DAVIS ODELIA P. NA FEMALE 79 ACTIVE VERIFIED
DUKE CYNTHIA R. NA FEMALE 46 ACTIVE LEGACY DATA
DUKE FRED R. NA MALE 76 ACTIVE LEGACY DATA
GAY CYNDA D. NA FEMALE 35 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
GIBBS JEFF G. NA MALE 58 ACTIVE LEGACY DATA
GIBBS MRS THEODORE C. NA FEMALE 84 REMOVED DECEASED
HOLLIS PAULINE B. NA FEMALE 78 ACTIVE VERIFICATION PENDING
HOLLIS THERESA T. NA FEMALE 53 ACTIVE VERIFICATION PENDING
HOLMES WILMA C. NA FEMALE 90 ACTIVE VERIFICATION PENDING
HUDSON EUNICE G. NA FEMALE 79 ACTIVE LEGACY DATA
JAMES KIM L.ORUNE OLDS NA MALE 43 ACTIVE VERIFICATION PENDING
KING GEORGE H. NA MALE 55 ACTIVE VERIFICATION PENDING
KNOTTS SYBLE S. NA FEMALE 68 ACTIVE VERIFICATION PENDING
LEARY OLIVIA H. NA FEMALE 62 ACTIVE VERIFICATION PENDING
LIVERMAN ALICE J. NA FEMALE 86 REMOVED DECEASED
LIVERMAN JAMIE C. NA MALE 45 REMOVED MOVED FROM COUNTY
LIVERMAN MARGARET A. NA FEMALE 87 REMOVED DECEASED
MCCLEES PACOHONTAS B. NA FEMALE 92 ACTIVE VERIFICATION PENDING
MCGUINNESS ILA K. NA FEMALE 67 REMOVED MOVED FROM COUNTY
SAWYER MARIE C. NA FEMALE 49 ACTIVE LEGACY DATA
SMITH ALICE C. NA FEMALE 81 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
  • 10,017 names with period
  • Some are abbreviations of SAINT although spacing is inconsistent, e.g. ST. CLAIR, ST.CLAIR
  • Some are abbreviation of generation suffix, e.g. JR.
  • Some are substituted for whitespace, e.g. BRICE.MICHAEL ARTHUR
  • Some appear to be a random termination, e.g. PENDLETON-S. , SHIELDS.
  • Some indicate initials, e.g. W., J.C.
  • Some indicate a contraction, e.g. MARY ELIZ.
  • Some appear to substitute for a single quote, e.g. L.ORUNE OLDS

Map period to empty string

3.6.4.12 Check for comma

x <- d %>% 
  dplyr::filter(stringr::str_detect(last_name, ","))
dim(x)
[1] 63  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
AMATO,KATHERINE,M NA NA NA FEMALE 50 REMOVED DUPLICATE
AMIDON,PETER,LEVENT NA NA NA MALE 33 REMOVED DUPLICATE
BELL,MITCHELL THOMAS ,II NA NA MALE 35 REMOVED DUPLICATE
BEST,SYDNEY,ALLISON NA NA NA FEMALE 37 REMOVED DUPLICATE
BOYD,ALLEN AUBREY,II NA NA NA MALE 35 REMOVED DUPLICATE
BROWN,FREDERIC CHEST ER,JR NA NA MALE 60 REMOVED DECEASED
BROWN,ROBERT EDWARD, JR NA NA MALE 38 REMOVED DUPLICATE
BUCHANAN,SAMMY JOE,J R NA NA MALE 40 REMOVED DUPLICATE
BUNTON,RAYMOND AVNEY ,JR NA NA MALE 46 REMOVED DUPLICATE
BURGESS,WINFRED LEE, JR NA NA MALE 54 REMOVED DUPLICATE
BURKE, GEORGE W NA FEMALE 69 REMOVED REQUEST FROM VOTER
BURNETTE,TOMMY WILLI AM,JR NA NA MALE 35 REMOVED DUPLICATE
CARR,WENDELL,H JR NA NA NA MALE 36 REMOVED DUPLICATE
CATHEY,LONNIE,JR NA NA NA MALE 57 REMOVED DUPLICATE
CUSTER,GEORGE D,JR NA NA NA MALE 51 REMOVED DUPLICATE
DAVIS,LEWIS EVERETTE ,JR NA NA MALE 54 REMOVED DUPLICATE
EDWARDS,MARK BROWNLO W,JR NA NA MALE 37 REMOVED DUPLICATE
FERNANDEZ,DE CASTRO, SCOTT NA NA MALE 34 REMOVED DUPLICATE
FILLINGHAM, II ROBERT E NA MALE 53 ACTIVE VERIFIED
FORTNER,II JERRY J NA MALE 38 ACTIVE LEGACY DATA
FULK,IVEY LEE,JR NA NA NA MALE 45 REMOVED DUPLICATE
FUTRELL,JOHN MARION, JR NA NA MALE 47 REMOVED DUPLICATE
GARRISON,JAMES MARVI N,JR NA NA MALE 57 REMOVED DUPLICATE
HALL,PONTHEOLA,M NA NA NA FEMALE 53 REMOVED DUPLICATE
HODNETT,DORGIE,JR NA NA NA MALE 52 REMOVED DUPLICATE
HOGSHEAD,THOMAS H,JR NA NA NA MALE 66 REMOVED DUPLICATE
HOOKER,GIRRIE MATHIS ,III NA NA MALE 48 REMOVED DUPLICATE
HUGHES,KELLEY,SUZETT E NA NA FEMALE 38 REMOVED DUPLICATE
JENKINS,JAMES W,JR NA NA NA MALE 36 REMOVED DUPLICATE
JOHNSON,BILLY TURNER ,JR NA NA MALE 44 REMOVED DUPLICATE
JONES,JOHNSIE,H NA NA NA FEMALE 92 REMOVED DUPLICATE
KEY,GENE SAMUEL,JR NA NA NA MALE 44 REMOVED DUPLICATE
LUCAS,KENNETH SHELTO N,JR NA NA MALE 45 REMOVED DUPLICATE
MAJETTE,GEORGE THURM AN,JR NA NA MALE 35 REMOVED DUPLICATE
MAPP,DWIGHT,BENJAMIN NA NA NA MALE 57 REMOVED DUPLICATE
MCCRARY,RICHARD DALE ,JR NA NA MALE 41 REMOVED DUPLICATE
MOORE,JIMMY GORDON,S R NA NA MALE 64 REMOVED DUPLICATE
MOORING,MOLLY TUTTEROW NA NA FEMALE 61 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MOREHEAD,JESSE JAMES ,JR NA NA MALE 42 REMOVED DUPLICATE
MURPHY,CHARLES ST C, III NA NA MALE 58 REMOVED DUPLICATE
OERTHER,FREDERICK,JO HN NA NA MALE 46 REMOVED DUPLICATE
PERRY,EMMETT,PERRY J R NA NA MALE 45 REMOVED DUPLICATE
PURGASON,SILAS WILSO N,JR NA NA MALE 84 REMOVED DUPLICATE
RAYLE,,GORDON HENRY JR NA NA MALE 64 REMOVED DUPLICATE
REED,CHARLES LARUS,I II NA NA MALE 40 REMOVED DUPLICATE
REYES,CHARLES MANUEL ,JR NA NA MALE 53 REMOVED DUPLICATE
ROGERS,JR. DAVID J. NA MALE 73 REMOVED REMOVED UNDER OLD PURGE LAW
ROUSE,ESTHER, MAE NA NA NA FEMALE 52 REMOVED DUPLICATE
RUSSELL, JR. KERMITT PATRICK NA MALE 36 ACTIVE VERIFIED
SCARLETTE,CHARLES F, JR NA NA MALE 40 REMOVED DUPLICATE
SCHELIN,CHRISTOPHER, D NA NA MALE 39 REMOVED DUPLICATE
SHIPMAN,ELBERT LEE,J R NA NA MALE 45 REMOVED DUPLICATE
SIMS,RAYMOND LEE,SR NA NA NA MALE 66 REMOVED DUPLICATE
STANLEY,HUGH EATON,J R NA NA MALE 83 REMOVED DUPLICATE
SUTTER , III HOWARD EUGENE NA MALE 35 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
TAYLOR,JOHN MARION,J R NA NA MALE 67 REMOVED DUPLICATE
WADE,RODERICK WILSON ,JR NA NA MALE 57 REMOVED DUPLICATE
WALKER,CHARLES,JR NA NA NA MALE 56 REMOVED DUPLICATE
WALROND,CHRISTOPHER, WADE NA NA MALE 38 REMOVED DUPLICATE
WASHINGTON,SURADA,LA VONNE NA NA FEMALE 43 REMOVED DUPLICATE
WEATHERINGTON,III RICHARD B NA MALE 56 REMOVED REMOVED UNDER OLD PURGE LAW
WHITAKER,JAMES L,JR NA NA NA MALE 35 REMOVED DUPLICATE
WHITE,LEE E,JR NA NA NA FEMALE 35 REMOVED DUPLICATE
x <- d %>% 
  dplyr::filter(stringr::str_detect(first_name, ","))
dim(x)
[1] 51  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
BELL,MITCHELL THOMAS ,II NA NA MALE 35 REMOVED DUPLICATE
BOGUE EUGENE, JR NMN NA MALE 69 ACTIVE LEGACY DATA
BROWN,FREDERIC CHEST ER,JR NA NA MALE 60 REMOVED DECEASED
BUNTON,RAYMOND AVNEY ,JR NA NA MALE 46 REMOVED DUPLICATE
BURNETTE,TOMMY WILLI AM,JR NA NA MALE 35 REMOVED DUPLICATE
CANIPE NOAH, NA JR MALE 70 ACTIVE VERIFIED
CARAWAN OTTIS, JR. NA NA MALE 85 ACTIVE LEGACY DATA
DAVIS OLANDORS, JR NA NA MALE 33 ACTIVE LEGACY DATA
DAVIS,LEWIS EVERETTE ,JR NA NA MALE 54 REMOVED DUPLICATE
DE MACARTY MACARTY, SHARON K NA FEMALE 53 REMOVED REMOVED UNDER OLD PURGE LAW
DU BREUIL, MARION ORTH NA FEMALE 79 REMOVED MOVED FROM COUNTY
EDWARDS,MARK BROWNLO W,JR NA NA MALE 37 REMOVED DUPLICATE
EFFLER WELZIE,SR. NA NA MALE 78 REMOVED ADMINISTRATIVE
EL RAMEY, BURGWYN BROW NA FEMALE 78 REMOVED DECEASED
GARRISON,JAMES MARVI N,JR NA NA MALE 57 REMOVED DUPLICATE
GIBBS CALEB, JR. NA NA MALE 80 ACTIVE LEGACY DATA
GREENE RALPH, NA JR MALE 46 INACTIVE CONFIRMATION NOT RETURNED
HAMMETT STANLEY, NA JR MALE 71 REMOVED ADMINISTRATIVE
HAULSEY ROY,JR. NA NA MALE 88 REMOVED ADMINISTRATIVE
HEIDE HEIDE, KENNETH NA MALE 48 REMOVED DUPLICATE
HENSLEY WILLIAM,JR. NA NA FEMALE 84 REMOVED ADMINISTRATIVE
HICKS MARION, NA SR MALE 58 ACTIVE VERIFIED
HILLIARD LONNIE, JR. NA MALE 66 REMOVED DECEASED
HOOKER,GIRRIE MATHIS ,III NA NA MALE 48 REMOVED DUPLICATE
JOHNSON,BILLY TURNER ,JR NA NA MALE 44 REMOVED DUPLICATE
KREIDER JAMES,JR. NA NA MALE 41 REMOVED ADMINISTRATIVE
LA SHIER, JAMES RATHBU NA MALE 44 REMOVED REMOVED UNDER OLD PURGE LAW
LA SHIER, TAMMY LOUISE NA FEMALE 43 REMOVED REMOVED UNDER OLD PURGE LAW
LE RENDU, LESLEY WALTE NA MALE 79 REMOVED REMOVED UNDER OLD PURGE LAW
LOGAN LEO,JR. NA NA MALE 64 ACTIVE LEGACY DATA
LUCAS,KENNETH SHELTO N,JR NA NA MALE 45 REMOVED DUPLICATE
MAJETTE,GEORGE THURM AN,JR NA NA MALE 35 REMOVED DUPLICATE
MCADAMS WILL,JR NA NA MALE 56 ACTIVE VERIFIED
MCCRARY,RICHARD DALE ,JR NA NA MALE 41 REMOVED DUPLICATE
MCKINNEY LUTHER, NA JR MALE 76 REMOVED ADMINISTRATIVE
MEANS JASPER, NA JR MALE 79 REMOVED ADMINISTRATIVE
MEULEBROECKE MEULEBROECKE, HELENE NA FEMALE 40 REMOVED DUPLICATE
MOREHEAD,JESSE JAMES ,JR NA NA MALE 42 REMOVED DUPLICATE
PFAFF PFAFF, EMILY NA FEMALE 43 REMOVED ADMINISTRATIVE
PHILLIPS FRANK, NA JR MALE 50 ACTIVE VERIFIED
PROCTOR J.D., NA JR FEMALE 76 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
PURGASON,SILAS WILSO N,JR NA NA MALE 84 REMOVED DUPLICATE
REYES,CHARLES MANUEL ,JR NA NA MALE 53 REMOVED DUPLICATE
RIDDLE DEWITT,JR. NA NA MALE 72 ACTIVE LEGACY DATA
RUTHERFORD BRINGER,JR. NA NA MALE 70 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
SAWYER NAT, JR NA NA MALE 82 REMOVED DECEASED
SCHULTZ STANLEY, JR NA NA MALE 66 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
SWEPSON CECIL, NA SR MALE 77 REMOVED ADMINISTRATIVE
VAN DEMAN, TATE NA MALE 60 REMOVED REMOVED UNDER OLD PURGE LAW
WADE,RODERICK WILSON ,JR NA NA MALE 57 REMOVED DUPLICATE
ZANDE ZANDE, CHARLES NA MALE 76 REMOVED ADMINISTRATIVE
x <- d %>% 
  dplyr::filter(stringr::str_detect(midl_name, ","))
dim(x)
[1] 58  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
ADKINS CHARLES ALLEN, JR. NA MALE 39 REMOVED MOVED FROM COUNTY
ANDREWS JAMES CARNELL, J NA MALE 34 REMOVED FELONY CONVICTION
BALLERO VIRGINIA MARY, D NA FEMALE 64 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BARNES RUSSELL JOSEPH, J NA MALE 71 ACTIVE VERIFIED
BATTLE ANNIE RAY, TAYLOR NA FEMALE 56 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BRASWELL ROBERT ELLIS, J NA MALE 47 ACTIVE VERIFIED
BROUSSARD DONALD JAMES, II NA MALE 40 REMOVED MOVED FROM COUNTY
CINQUEMANI ANTHONY LOUIS,III NA MALE 45 ACTIVE LEGACY DATA
CLARK COLEMAN JACKSON, I NA MALE 36 ACTIVE VERIFIED
CLEMMONS ALTON BEAMAN, I III MALE 40 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
COPPAGE JOESPH EDWARD, J NA MALE 31 REMOVED FELONY CONVICTION
COVINGTON EDNA(MRS PERRY, JR) NA FEMALE 0 ACTIVE VERIFIED
DAIL JR. ERNEST, VERNON NA MALE 47 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DAVIS ANN STRAY, GUNDERS NA FEMALE 47 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DAVIS JO ANN, W NA FEMALE 47 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOZIER ROSA LEE, DEW NA FEMALE 81 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
EDWARDS ANNIE B., DIXON NA FEMALE 92 REMOVED DECEASED
EVERETTE JO ANN, KIRKMAN NA FEMALE 46 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
FAUCETTE JESSE EDWARD, J NA MALE 66 ACTIVE VERIFIED
FERGUSON STANTON HYDE, J NA MALE 57 ACTIVE VERIFIED
FOX ANNA MAE, HILLIARD NA FEMALE 83 REMOVED DECEASED
GATLING EVA GERTRUDE, B. NA FEMALE 0 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
GAY ROBERT HENRY, III. NA MALE 32 ACTIVE VERIFIED
GLOVER JO ANN, PATE NA FEMALE 56 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
GRAHAM BARBARA M., VANN NA FEMALE 41 REMOVED MOVED FROM COUNTY
GREESON WELDON RONNIE, S NA MALE 68 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HUFFMAN LUTHER G, NA MALE 68 REMOVED REMOVED UNDER OLD PURGE LAW
HUGHES DEWEY , JR MALE 68 REMOVED ADMINISTRATIVE
JACKSON ROBERT EUGENE,JR NA MALE 36 REMOVED REMOVED UNDER OLD PURGE LAW
JONES JOHN H, NA MALE 80 REMOVED REMOVED UNDER OLD PURGE LAW
LAMPERT SADRON C, III MALE 62 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
LEDFORD WANDA M, NA FEMALE 56 REMOVED MOVED FROM COUNTY
LEE JOSEPH EDWIN, III MALE 53 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
LEWIS JR JAMES, THOMAS NA MALE 63 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
MACKLIN ARGIE LENE, PARK NA FEMALE 73 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
MARTIN LLOYD FRANKLIN, S NA MALE 49 ACTIVE VERIFIED
MELLON JANET C, BONI NA FEMALE 56 REMOVED MOVED FROM COUNTY
NEWSOME MATTIE RUTH, P. NA FEMALE 82 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
PIERCE RUTH P, NA FEMALE 75 ACTIVE VERIFIED
PITTMAN JERRY WALLACE, I III MALE 41 REMOVED FELONY CONVICTION
PROCTOR WILLIAM EDSEL, J NA MALE 50 ACTIVE LEGACY DATA
PULLEY ADA MAE, GRAY NA FEMALE 75 ACTIVE LEGACY DATA
SCARBOROUGH JOHN R, NA MALE 82 ACTIVE VERIFIED
SCHMALTZ WILLIAM FRANK, IV MALE 59 REMOVED DECEASED
SHEARIN ANDREW THOMAS, S NA MALE 49 ACTIVE VERIFIED
SIMMONS JAMES EDWARDS, J NA MALE 38 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
STEELE NELSON GILBERT, J NA MALE 58 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
STHRESHLEY LAWRENCE FITZHUGH, III MALE 47 REMOVED REMOVED UNDER OLD PURGE LAW
STOCKS JAMES ALLAN, IV. NA MALE 33 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
TAYLOR JAMES ROBINSON, J NA MALE 42 REMOVED DUPLICATE
THOMAS HERBERT STUART, J NA MALE 57 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
THOMAS MARY MATTHEW, EDW NA FEMALE 105 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
TILLERY GEORGE THOMAS, S NA MALE 52 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
VAN DEN BERG, LINDA NA FEMALE 38 REMOVED MOVED FROM COUNTY
WASHINGTON JO ANN, FLOYD NA FEMALE 67 ACTIVE LEGACY DATA
WILLIAMS DONNIE MAE, MRS NA FEMALE 89 REMOVED DECEASED
WILLIAMS ERVIN W., SR., NA MALE 38 ACTIVE VERIFIED
WOODS JR. CHARLES, LEWIS NA MALE 58 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
  • 172 names with comma
  • Many commas are when multiple name components have been included in the last name, e.g. “AMATO,KATHERINE,M” “HODNETT,DORGIE,JR”
  • Some commas are when multiple name components have been included in the first name, e.g. “EUGENE, JR”, “,JR”
  • Some commas are when multiple name components have been included in the first name, e.g. “MARY, D”, “JAMES, II”, “MAE, MRS”, “W., SR.,”

Map comma to empty string

3.6.4.13 Check for backslash

x <- d %>% 
  dplyr::filter(stringr::str_detect(last_name, "\\\\"))
dim(x)
[1] 4 8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
BUFFKIN\ WESLEY RYAN NA MALE 19 ACTIVE VERIFIED
GOSHEN\ DIXIE M NA FEMALE 28 ACTIVE VERIFIED
PUTNAM\ TAMARA LEIGH NA FEMALE 44 INACTIVE CONFIRMATION NOT RETURNED
STRTHEIT\ LOLA C NA FEMALE 60 ACTIVE VERIFIED
x <- d %>% 
  dplyr::filter(stringr::str_detect(first_name, "\\\\"))
dim(x)
[1] 3 8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
BARBOUR DONNA IRENE\ P NA FEMALE 60 REMOVED DECEASED
MANUEL KEVIN\ NA NA MALE 43 ACTIVE LEGACY DATA
RHEA STEPHANIE\ LYNN NA FEMALE 29 ACTIVE VERIFIED
x <- d %>% 
  dplyr::filter(stringr::str_detect(midl_name, "\\\\"))
dim(x)
[1] 67  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
ADDY ROBERT WILLIAM NA MALE 23 REMOVED REMOVED UNDER OLD PURGE LAW
ARRINGTON KRISTIN CELESTE NA FEMALE 34 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ATKINS DEBRA L NA FEMALE 51 REMOVED REMOVED UNDER OLD PURGE LAW
BARNETTE DIANA LINN NA FEMALE 40 ACTIVE CONFIRMATION PENDING
BEESON PATRICIA ANN NA FEMALE 43 REMOVED REMOVED UNDER OLD PURGE LAW
BIESECKER EMILY E NA FEMALE 51 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BLACKWELL DONNA KAYE NA FEMALE 30 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BLACKWELL MELISSA D NA FEMALE 26 REMOVED REMOVED UNDER OLD PURGE LAW
BLEVINS MARY RUTH NA FEMALE 28 ACTIVE VERIFIED
BURNETTE TINA LYNN NA FEMALE 29 REMOVED DUPLICATE
CAPPS IVA MAY\ NA FEMALE 70 ACTIVE LEGACY DATA
CARSON CHRISTOPHER DEVON NA MALE 24 REMOVED REMOVED UNDER OLD PURGE LAW
CHANCE ELIZABETH ANN\ NA FEMALE 34 REMOVED REMOVED UNDER OLD PURGE LAW
CONKLIN SYLVINA ESTER NA FEMALE 75 REMOVED DECEASED
CONSTANTIN SHERRIE ANN NA FEMALE 29 REMOVED REMOVED UNDER OLD PURGE LAW
COOKE TIMOTHY DAVID\ SR MALE 46 REMOVED MOVED FROM COUNTY
EDWARDS JAMIE LYNN NA FEMALE 27 REMOVED MOVED FROM COUNTY
EMERY BRENDA JOYCE NA FEMALE 56 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
FLECK KERRI LEE NA FEMALE 34 REMOVED REMOVED UNDER OLD PURGE LAW
GOMBAR KATHRYN NA FEMALE 19 ACTIVE VERIFIED
GOODWIN WENDY DENISE NA FEMALE 36 REMOVED REMOVED UNDER OLD PURGE LAW
GOSNELL ELIZEBETH MARY NA FEMALE 45 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HAGUE HEIDI CHARLENE NA FEMALE 30 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HALL ALLISON DAWN BARRET NA FEMALE 33 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
HASKELL ALICE MARY NA FEMALE 105 REMOVED DECEASED
HASTINGS JUDITH NA FEMALE 60 REMOVED MOVED FROM COUNTY
HERNON VALERIE OLGA NA FEMALE 105 ACTIVE VERIFIED
JOHNSON DEBRA FAY NA FEMALE 49 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
JUSTUS JANE ANN NA FEMALE 67 REMOVED REMOVED UNDER OLD PURGE LAW
LIFORD TAMMI DENISE NA FEMALE 36 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
LILLY GRACIELLA LAMAS NA FEMALE 44 REMOVED DUPLICATE
LOVE CRYSTAL CHERIE\ NA FEMALE 26 INACTIVE CONFIRMATION NOT RETURNED
MARION WHYSHENA LANETA NA FEMALE 30 REMOVED REMOVED UNDER OLD PURGE LAW
MCENTYRE SHARON ELAINE NA FEMALE 49 REMOVED MOVED FROM COUNTY
MCMURRAY CHARLENE ANN NA FEMALE 40 REMOVED DUPLICATE
MELTON STEPHANIE STARR NA FEMALE 32 REMOVED MOVED FROM COUNTY
MITCHELL MADELINE RITA NA FEMALE 70 REMOVED MOVED FROM STATE
MOUZON JOANN \ NA FEMALE 49 ACTIVE CONFIRMATION PENDING
NOBILE THERESA MARY NA FEMALE 90 REMOVED REMOVED UNDER OLD PURGE LAW
OWENS WANDA JEAN NA FEMALE 39 REMOVED REMOVED UNDER OLD PURGE LAW
PARKER JAMIE LYNN\ NA MALE 30 ACTIVE LEGACY DATA
PERRY KATHLEEN MARIE NA FEMALE 27 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
PERRY LOUISE ELIZABETH\ NA FEMALE 76 ACTIVE VERIFIED
PETTY SHARON RENEE NA FEMALE 37 REMOVED REMOVED UNDER OLD PURGE LAW
REPASS EMMA LOU NA FEMALE 72 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
REYNOLDS SHERIE LYNNETTE NA FEMALE 34 REMOVED MOVED FROM COUNTY
RICE SHIRLEY MAE NA FEMALE 70 REMOVED REMOVED UNDER OLD PURGE LAW
RICKELMAN PATRICK LEO\ NA MALE 44 ACTIVE VERIFIED
RUSSELL PATSY REBECCA NA FEMALE 40 REMOVED DECEASED
RUX PATRICIA JEAN NA FEMALE 42 REMOVED MOVED FROM COUNTY
SIMONS JAREDD MARTIN NA MALE 27 REMOVED REMOVED UNDER OLD PURGE LAW
SOUTHERLAND DYANNA LYNNE NA FEMALE 49 REMOVED DECEASED
STECHSCHULTE STACY LYNN NA FEMALE 33 REMOVED REMOVED UNDER OLD PURGE LAW
STONER KENNETH NA MALE 58 ACTIVE VERIFIED
SULLIVAN DEVONIE GEARLINE\ NA FEMALE 65 INACTIVE CONFIRMATION NOT RETURNED
THOMPSON IVA LA JUANA NO NA FEMALE 48 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
THOMPSON MODELYN DAWN NA FEMALE 105 REMOVED REMOVED UNDER OLD PURGE LAW
TWITTY MARY JOANNE NA FEMALE 29 REMOVED DUPLICATE
TWITTY SHERRY RENEE NA FEMALE 32 REMOVED MOVED FROM COUNTY
VAN SICKLE CATHY LYNN NA FEMALE 54 REMOVED REMOVED UNDER OLD PURGE LAW
VANBUMBLE JONANNA KAY NA FEMALE 46 REMOVED MOVED FROM COUNTY
WARDHAMMAR DARLENE RUTH NA FEMALE 58 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
WHISNANT JEWEL ANN NA FEMALE 62 REMOVED DECEASED
WILLIAMS WENDY LYNN NA FEMALE 39 REMOVED REMOVED UNDER OLD PURGE LAW
WILSON MAE LORINE NA FEMALE 44 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
WILSON PATRICIA ANN NA FEMALE 62 REMOVED DUPLICATE
WULFING AROL ROSE NA FEMALE 67 REMOVED REMOVED UNDER OLD PURGE LAW
  • 74 names with backslash
  • Some appear to be terminal markers, e.g. BUFFKIN\, KEVIN\
  • Some are used as substitutes for whitespace, e.g. WILLIAM\BOND, ROSE\MERSON

Map backslash to empty string

3.6.4.14 Check for parentheses

x <- d %>% 
  dplyr::filter(stringr::str_detect(last_name, "[()]"))
dim(x)
[1] 22  8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
BAKER (MCFADYEN) MARY WORTHY NA FEMALE 30 REMOVED MOVED FROM COUNTY
BAREFOOT (RHINE) CAROL JEAN STRIDER NA FEMALE 60 REMOVED MOVED FROM COUNTY
CARSON (WADE) PRISCILLA ANN NA FEMALE 33 REMOVED MOVED FROM COUNTY
COLLINS (SISTER) M GRETA NA FEMALE 86 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
COTHERN (BLAKE) JUDITH C. NA FEMALE 46 REMOVED MOVED FROM COUNTY
EDENS (ARCHAMBAU KELLY ROSE NA FEMALE 36 REMOVED MOVED FROM COUNTY
EVANS (ABBOTT) GWENDOLYN DUNITA NA FEMALE 33 REMOVED MOVED FROM COUNTY
FEE (SISTER) HELENE NA NA FEMALE 67 REMOVED MOVED FROM COUNTY
FOSTER (KING) STACY LEIGH NA FEMALE 35 REMOVED MOVED FROM COUNTY
HUDSON (HALL) PAMELA JO NA FEMALE 52 REMOVED MOVED FROM COUNTY
JOHNSON (BLIND VOT MARTHA GLADYS NA FEMALE 101 REMOVED DECEASED
KINLAW (GUIN) LORI ANN NA FEMALE 44 REMOVED MOVED FROM COUNTY
MCCLAIN (SISTER) M MILDRED NA FEMALE 77 REMOVED MOVED FROM COUNTY
MCDONOUGH (SISTER) M BERNITA NA FEMALE 90 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MCMILLIAN (MUMFO BETTY ANN NA FEMALE 55 REMOVED MOVED FROM COUNTY
MCQUEEN (MORRISE MARY LOUISE NA FEMALE 47 REMOVED MOVED FROM COUNTY
MOCCIA (SMITH) DONNA MARIE NA FEMALE 44 REMOVED MOVED FROM COUNTY
NEESE (BLIND VOTER HOWARD CLARENCE NA MALE 90 ACTIVE LEGACY DATA
NICHOLS (NORTON) JOY FERGUSON NA FEMALE 45 REMOVED MOVED FROM COUNTY
PALMER(BRIGGS) MARILYN P NA FEMALE 61 REMOVED REMOVED UNDER OLD PURGE LAW
PARISH (RAMON) ROSE MARIE NA FEMALE 41 REMOVED MOVED FROM COUNTY
SYKES (BRICKHOUSE) ANTHONY E. NA FEMALE 73 REMOVED DECEASED
x <- d %>% 
  dplyr::filter(stringr::str_detect(first_name, "[()]"))
dim(x)
[1] 105   8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
AIKEN O (LULLIE) H (LYON) NA FEMALE 107 REMOVED DECEASED
AINSLEY J. (JULIUS) T.(THOMAS) NA MALE 65 ACTIVE VERIFIED
ANDERS L(NN) C(NN) JR MALE 46 REMOVED MOVED FROM COUNTY
ANDERSON MARGARET (MEG) WILLIAM NA FEMALE 52 ACTIVE VERIFIED
ARMSTRONG GEORGE (BERT) H NA MALE 42 REMOVED REMOVED UNDER OLD PURGE LAW
BAILES RUBY (POLLY) BURTON NA FEMALE 82 ACTIVE LEGACY DATA
BAIRD N (MARY) R (ROYALL) JR FEMALE 88 REMOVED DECEASED
BALL JO(JORETTA) DEVINNEY NA FEMALE 74 REMOVED REMOVED UNDER OLD PURGE LAW
BARKER C (BESSIE) M NA FEMALE 0 ACTIVE VERIFIED
BECK W (WILLIAM) H (HARVEY) NA MALE 100 REMOVED DECEASED
BEHELER EUNICE(PAT) ROPER NA FEMALE 58 ACTIVE VERIFIED
BENNETT D (MAURINE) M NA FEMALE 89 ACTIVE VERIFIED
BORDERS EUGENE(NMN) NA NA MALE 61 ACTIVE VERIFIED
BROWN JUDITH (JUDE) BROMHALL NA FEMALE 58 REMOVED REMOVED UNDER OLD PURGE LAW
BRUTON DANIEL (DANNY C NA MALE 44 ACTIVE VERIFIED
BRYANT ADA (POLLY) BOGGS NA FEMALE 70 ACTIVE LEGACY DATA
BULLOCK FRANK (WILMA) W NA FEMALE 94 REMOVED DECEASED
BULLOCK P (ROSALIND) C NA FEMALE 0 REMOVED DECEASED
CAGLE JOHN (JACK) F NA MALE 37 ACTIVE VERIFIED
CAMERON LEON(BLUE) GIBSON NA MALE 76 REMOVED DECEASED
CAPEL JAMES (JIM) NA NA MALE 73 REMOVED MOVED FROM COUNTY
CHANDLER W.(WALTER) CARL NA MALE 64 ACTIVE VERIFIED
COVINGTON EDNA(MRS PERRY, JR) NA FEMALE 0 ACTIVE VERIFIED
CURRIN WILLIAM(BILL) JOSEPH NA MALE 56 ACTIVE LEGACY DATA
DANCE CAROL( CAROLYN NA FEMALE 60 REMOVED REMOVED UNDER OLD PURGE LAW
DANIEL W (WYATT) O (OWEN) NA MALE 101 REMOVED DECEASED
DEMOSS JERREL (JERRY LYNN NA MALE 53 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DIXON SUSAN (SUSIE) SHIELDS NA FEMALE 77 REMOVED DECEASED
DOVER NELSON (ETHEL H NA FEMALE 85 REMOVED DECEASED
DOZIER W C (MICKEY) NA NA MALE 70 ACTIVE LEGACY DATA
DUNN MARY (“PETE”) BURNETTE NA FEMALE 71 ACTIVE VERIFIED
EDWARDS CARL (CORY) STEWART JR MALE 27 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
FLOYD JOAN (JONI) H NA FEMALE 54 REMOVED MOVED FROM COUNTY
FRINK AL (NANCY) CLAYTESE NA FEMALE 69 ACTIVE VERIFIED
GARBARINO SENES (ED) E NA MALE 87 REMOVED REMOVED UNDER OLD PURGE LAW
GOOCH JOSEPH (JOE) W NA MALE 79 ACTIVE VERIFIED
GOWAN BURNICE ( DEAN) NA NA MALE 46 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
GREENE B F ( WARREN NA MALE 85 ACTIVE VERIFIED
GROSS WALTER (WALLY P NA MALE 94 ACTIVE VERIFIED
GUTHRIE F (ROSA) W (WHEELER NA FEMALE 96 REMOVED DECEASED
HAMILTON EVANS (RED) SYMINGTON NA MALE 87 REMOVED DECEASED
HARRINGTON LAWRENCE(LARR C NA MALE 51 REMOVED REMOVED UNDER OLD PURGE LAW
HAYES VIVIAN (BETH) WRIGHT NA FEMALE 44 REMOVED MOVED FROM COUNTY
HAYNES (MARTHA) DIANE NA FEMALE 55 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
HOLBROOK C (FANNIE) L (BELLE) NA FEMALE 81 REMOVED DECEASED
HOWELL B (PEARL) D (SEARS) NA FEMALE 104 REMOVED DECEASED
JHANJI (ANUPAN) ANDY NA MALE 41 REMOVED MOVED FROM COUNTY
KERN O (BUDDY) R NA MALE 67 ACTIVE VERIFIED
LANCASTER AMORITA (AMY) REQUENA NA FEMALE 30 ACTIVE VERIFIED
LAWS BJ(NMN) NA NA MALE 57 REMOVED MOVED FROM COUNTY
LITTLE JAMES (BUCK) NA NA MALE 74 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
LOFTIN WILLIAM (BILL M NA MALE 87 REMOVED REMOVED UNDER OLD PURGE LAW
LOUGHNEY CAROL (SISTE NA NA FEMALE 67 REMOVED REMOVED UNDER OLD PURGE LAW
MANGUM O (MAUDE) T (LOANE) NA FEMALE 109 REMOVED DECEASED
MANN WILLIAM (BILL MURRAY NA MALE 45 ACTIVE LEGACY DATA
MARVIN JEAN( IMOGENE NA FEMALE 72 ACTIVE VERIFIED
MATTHEWS JAMES (BUCK) PERCY NA MALE 58 ACTIVE LEGACY DATA
MAY J (MINNIE) O (B) NA FEMALE 90 REMOVED DECEASED
MCAULAY CHARLES (CHIP T NA MALE 42 REMOVED MOVED FROM COUNTY
MULLINIX SARAH (CAROL) D NA FEMALE 60 REMOVED REMOVED UNDER OLD PURGE LAW
NEGUS JOSEPH (JOE) SAMUEL NA MALE 37 ACTIVE VERIFIED
NEWTON JOAN (INEZ) K NA FEMALE 57 ACTIVE LEGACY DATA
NICHOLS DORIS ( MRS W NA NA FEMALE 92 ACTIVE VERIFIED
NICKELL GENEVA(GINNI) B NA FEMALE 59 ACTIVE LEGACY DATA
NOVOTKA JANICE (SISTE NA NA FEMALE 45 REMOVED REMOVED UNDER OLD PURGE LAW
PADGETT JOE (DR) C. NA MALE 81 REMOVED DECEASED
PARKER ANGELA (SISTE MARY NA FEMALE 77 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
PARKER J E (BUCK) NA NA MALE 81 REMOVED DECEASED
PARRISH THOMAS (JACK) JACKSON NA MALE 74 ACTIVE VERIFIED
POOLE SALLIE (PAT) WARREN NA FEMALE 82 ACTIVE LEGACY DATA
POTEAT (KAY) ANNE CATH NA FEMALE 60 ACTIVE VERIFIED
PRIVOTT G H (JACK) JR NA MALE 82 REMOVED DECEASED
QUEEN GERALDINE(NMN NA NA FEMALE 61 ACTIVE VERIFIED
RAMSEY ALICIA(LISA) PATRICK NA FEMALE 39 ACTIVE LEGACY DATA
REAMS ALICIA (LISA) PATRICK NA FEMALE 39 REMOVED MOVED FROM COUNTY
RICE (REV) CALVIN SHIRLEY NA MALE 79 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
RIDDICK ROBERT(BOB) W NA MALE 33 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ROGERS J (MARY) H (ELLINGTON NA FEMALE 0 REMOVED DECEASED
SAUNDERS J.C. (MIKE) NA NA MALE 109 REMOVED REMOVED UNDER OLD PURGE LAW
SHAVER BRUCE (BJ) EUGENE JR MALE 26 REMOVED MOVED FROM COUNTY
SIMPSON DEBRA (DEBBIE) MORROW NA FEMALE 53 ACTIVE LEGACY DATA
SPEER HOWARD (HAL) L JR MALE 43 REMOVED REMOVED UNDER OLD PURGE LAW
SPENCER ELLA ( JACKIE WARD NA FEMALE 64 REMOVED MOVED FROM STATE
SPENCER JAMES (JIM) N NA MALE 56 ACTIVE VERIFIED
SPROUSE ROBERT (BOBBY A JR MALE 61 REMOVED REMOVED UNDER OLD PURGE LAW
STALLINGS ELIZABETH (BE LEA NA FEMALE 63 INACTIVE CONFIRMATION NOT RETURNED
STEM R (ESTELLE) O NA FEMALE 102 REMOVED DECEASED
STRICKLAND BENJAMIN(BEN) F NA MALE 77 ACTIVE VERIFIED
SWINDELL A (LINDA) B IV FEMALE 56 REMOVED MOVED FROM COUNTY
THOMPSON LILLIE (OLLIE B NA FEMALE 89 REMOVED DECEASED
TIPPETT J (BIRDIE K) G NA FEMALE 92 REMOVED DECEASED
TRIPLETT S.R.(JACK) NA NA MALE 83 REMOVED ADMINISTRATIVE
TURNER J D (DOC) NA NA MALE 89 REMOVED DECEASED
TYME (NO OTHER NAM NA NA FEMALE 42 REMOVED MOVED FROM STATE
VANNICOLA ANGELIQUE(SIS NA NA FEMALE 64 REMOVED REMOVED UNDER OLD PURGE LAW
WALKER (MRS) LOLA M NA FEMALE 112 REMOVED REMOVED UNDER OLD PURGE LAW
WHITFIELD W (MARJORIE) W (LYON) NA MALE 95 REMOVED DECEASED
WOOD E. H. (SONNY) NA III MALE 43 REMOVED REMOVED UNDER OLD PURGE LAW
WOODSON FAYE (T.) VENERABLE NA FEMALE 46 ACTIVE VERIFIED
YANCEY J (THELMA) T (LOU) NA FEMALE 103 REMOVED DECEASED
x <- d %>% 
  dplyr::filter(stringr::str_detect(midl_name, "[()]"))
dim(x)
[1] 2107    8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
ADAMS MARK (NMN) NA MALE 89 ACTIVE LEGACY DATA
AINSLEY J. (JULIUS) T.(THOMAS) NA MALE 65 ACTIVE VERIFIED
ALDRIDGE JAMES MICHAEL (MIK NA MALE 53 ACTIVE VERIFIED
ALVIN MARK (NMN) NA MALE 53 REMOVED MOVED FROM COUNTY
ANTONE OSCAR (NMN) SR MALE 95 REMOVED DECEASED
ARRINGTON ROBERT (MOLLIE) A SR FEMALE 102 REMOVED DECEASED
AVERETTE MAYNARD (WESLEY M.) NA MALE 69 REMOVED DECEASED
AWBREY KATHERINE EUNICE (GREE NA FEMALE 88 REMOVED REMOVED UNDER OLD PURGE LAW
BAGGETT J R (CATHE NA FEMALE 97 REMOVED DECEASED
BAILEY LOYD (NMN) NA MALE 61 ACTIVE LEGACY DATA
BAITY LEROY (NMN) NA MALE 65 ACTIVE VERIFIED
BALTEZORE ALLEN (NMN) NA MALE 74 REMOVED DECEASED
BANKS RUBY JEAN ( BASNI NA FEMALE 49 ACTIVE VERIFIED
BARLEY GEORGE (NMN) NA MALE 88 REMOVED MOVED FROM COUNTY
BARNES HENSON P (MARY) NA FEMALE 67 ACTIVE VERIFIED
BASNIGHT EDNA A ( TATEM ) NA FEMALE 66 ACTIVE VERIFIED
BASS H J (HUBERT) NA MALE 74 ACTIVE VERIFIED
BATEMAN MRS W E (POLLY) NA FEMALE 93 REMOVED DECEASED
BAYNARD CLIFFORD (NMN) JR MALE 79 REMOVED DECEASED
BEACH MARION C (SUSIE) NA FEMALE 60 ACTIVE VERIFIED
BLANCHARD RUTH (NMN) NA FEMALE 86 REMOVED MOVED FROM COUNTY
BOONE CARSIE (NMN) NA FEMALE 78 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BOONE ELVA (MAE) NA FEMALE 72 ACTIVE LEGACY DATA
BRANN ROBERT (MARGARET) NA FEMALE 92 REMOVED DECEASED
BROWN ANNIE C (MRS) NA FEMALE 109 REMOVED REMOVED UNDER OLD PURGE LAW
BRYANT WILLIAM E (MRS ) NA FEMALE 0 ACTIVE LEGACY DATA
BURGESS MATTIE (MRS VER FEMALE 98 REMOVED DECEASED
BURRELL WILLIAM JON (TOBY) NA MALE 44 REMOVED REMOVED UNDER OLD PURGE LAW
BURRUS JAMES H (B) NA MALE 68 ACTIVE LEGACY DATA
CAMENZIND PAULA (NMN) NA FEMALE 55 REMOVED MOVED FROM COUNTY
CARSON JEFF (SCOTT) NA MALE 41 REMOVED REMOVED UNDER OLD PURGE LAW
CASSTEVENS RALPH (NMN) NA MALE 56 ACTIVE LEGACY DATA
CHERRY BRENDA P.(GARRETT) NA FEMALE 49 ACTIVE VERIFIED
CIOTTI BERNARD (NMN) NA MALE 89 ACTIVE LEGACY DATA
CLAYTON MAYANNA C (MRS) NA FEMALE 114 REMOVED REMOVED UNDER OLD PURGE LAW
COVINGTON KATHERINE L (LOUISE) NA FEMALE 0 ACTIVE VERIFIED
CRANE HOMER (NMN) NA MALE 68 REMOVED DECEASED
CULP BERNICE (NMN) NA FEMALE 106 REMOVED REMOVED UNDER OLD PURGE LAW
DAVES LEON (NMN) NA MALE 75 REMOVED REMOVED UNDER OLD PURGE LAW
DAVIS ELOISE (NMN) NA FEMALE 68 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DAVIS LACY (SUE B ) NA FEMALE 0 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DEDNER WOLFGANG (NMN) NA MALE 83 REMOVED MOVED FROM COUNTY
DICKEY LINDA L.(DAILEY) NA FEMALE 46 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DUNCAN PETTY (PEGGY) LOU NA FEMALE 75 ACTIVE LEGACY DATA
ELLERBE JIMMIE H (MRS ) NA FEMALE 0 REMOVED DECEASED
FOX ELIZABETH (BETSY) C NA FEMALE 84 REMOVED DECEASED
FRYE LACY V (BUCK) NA MALE 73 ACTIVE VERIFIED
GIBBS JAMES E (A) NA MALE 93 REMOVED DECEASED
HALL BETTY (SUNNY) KEEF NA FEMALE 54 ACTIVE LEGACY DATA
HAMLIN ELIZABETH A F (BETTY) NA FEMALE 55 ACTIVE LEGACY DATA
HARRIS SAMANTHA (NMN) NA FEMALE 37 REMOVED REMOVED UNDER OLD PURGE LAW
HASSELL SYLVESTER (NMN) NA MALE 88 REMOVED REMOVED UNDER OLD PURGE LAW
HAWKINS JUDY WRENN (MRS) NA FEMALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
HAWKS REGINA ANN (KELLY) NA FEMALE 52 REMOVED MOVED FROM COUNTY
HICKS JAMES C (PETE) NA MALE 73 INACTIVE CONFIRMATION NOT RETURNED
HOOPER LARAE (ANITA) NA FEMALE 52 ACTIVE VERIFIED
HYLEMON KENNETH (NMN) NA MALE 40 REMOVED DECEASED
JACKSON CARL (MRS) NA FEMALE 85 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
JONES JULIA (LORI) COPE NA FEMALE 41 ACTIVE VERIFIED
JUDGE SARAH LYNN(DAVIDSO NA FEMALE 42 ACTIVE VERIFIED
KNAPP WILLIAM D (BILLY) NA MALE 37 REMOVED REMOVED UNDER OLD PURGE LAW
KNIGHT MRS R S ( RUTH ) NA FEMALE 100 REMOVED DECEASED
LATTA MYRTLE ANDREWS (MRS NA FEMALE 89 REMOVED REMOVED UNDER OLD PURGE LAW
LITTLE MARY ELIZABETH (B NA FEMALE 39 ACTIVE CONFIRMATION PENDING
LUKER WALLACE (NMN) NA MALE 64 ACTIVE VERIFIED
MANESS HAROLD M (CHIP) JR MALE 54 ACTIVE VERIFIED
MCNEILL MYRTLE JEAN (JEANNI NA FEMALE 60 ACTIVE LEGACY DATA
MILLER MARY (KATHERINE) NA FEMALE 58 ACTIVE LEGACY DATA
MITCHELL VERNIE VIRGIL (VV) NA MALE 84 REMOVED DECEASED
MORAN JOSEPH STEPHEN(STEV NA MALE 45 ACTIVE VERIFIED
MORGAN DOYLE (ETTA) JANE NA FEMALE 85 ACTIVE VERIFIED
NICHOLLS CHARLOTTE (KAY) NA FEMALE 72 ACTIVE VERIFIED
NICHOLS JOHN H (MRS ) NA FEMALE 74 ACTIVE LEGACY DATA
NORMAN CASSANDRA (NMN) NA FEMALE 41 REMOVED REMOVED UNDER OLD PURGE LAW
ORMSBY MARY ALICE(BENNET NA FEMALE 62 ACTIVE LEGACY DATA
PARKER ANNIE MAY (CAMERON NA FEMALE 88 REMOVED DECEASED
PARKER JACQUELYN P (MRS) NA FEMALE 75 REMOVED REMOVED UNDER OLD PURGE LAW
PARRISH BETTIE HUNT (MRS) NA FEMALE 112 REMOVED REMOVED UNDER OLD PURGE LAW
PASCHALL PENNY (PENELOPE) NA FEMALE 107 REMOVED DECEASED
PHILYAW ANTHONY ALLEN (TONY) NA MALE 44 REMOVED REMOVED UNDER OLD PURGE LAW
PHILYAW MARVIN HIRAM (HANK) NA MALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
POOLE MARY (JO ANN) NA FEMALE 73 ACTIVE LEGACY DATA
REAVES ALLIE MARGARET (MR NA FEMALE 96 REMOVED REMOVED UNDER OLD PURGE LAW
REECE ROY W (BILL) JR MALE 67 ACTIVE VERIFIED
REYNOLDS CECIL D (C.J.) JR MALE 59 ACTIVE VERIFIED
RITTENHOUSE FLORENCE PERRY (MRS) NA FEMALE 104 REMOVED REMOVED UNDER OLD PURGE LAW
ROYSTER BERNICE T (HOBSON) NA FEMALE 47 ACTIVE LEGACY DATA
SCARLETT MARY TYSON (MRS) NA FEMALE 70 REMOVED REMOVED UNDER OLD PURGE LAW
SEDBERRY CECIL EUGENE (RED) NA MALE 79 REMOVED DECEASED
SELBY MRS J D (VIVIAN) NA FEMALE 97 REMOVED DECEASED
SHAHBAZ JILL (NMN) NA FEMALE 30 REMOVED MOVED FROM COUNTY
SPEER LESA ANN (SMITH) NA FEMALE 39 ACTIVE VERIFIED
SPENCER WILLIAM JACOB(JAKIE) NA MALE 51 ACTIVE VERIFIED
VOLIVA R. O (OKLEY) NA MALE 91 ACTIVE CONFIRMATION PENDING
WALKER MADELINE HARRIS (MRS) NA FEMALE 99 REMOVED REMOVED UNDER OLD PURGE LAW
WILLIAMSON ANTHONY T (TONY) NA MALE 48 REMOVED MOVED FROM COUNTY
WOODLEY MRS WALLACE ( RUTH ) NA FEMALE 78 ACTIVE VERIFIED
WOODS GLENDA LOU (TILLEY) NA FEMALE 59 ACTIVE LEGACY DATA
WRIGHT CORA C (JANE) NA FEMALE 86 REMOVED MOVED FROM COUNTY
YANCEY J (THELMA) T (LOU) NA FEMALE 103 REMOVED DECEASED
  • 2,234 names with parentheses
  • Some appear to be maiden names, e.g. CARSON (WADE), MOCCIA (SMITH)
  • Some appear to be explanatory notes, e.g. NEESE (BLIND VOTER, JOHNSON (BLIND VOT
  • Some appear to indicate absence of a name, e.g. L(NN) = L (no name), (NMN) = (no middle name)
  • Some appear to be for honorific titles, e.g. FEE (SISTER), JOE (DR), (MRS)

Map parentheses to empty string

3.6.4.15 Check for braces

x <- d %>% 
  dplyr::filter(stringr::str_detect(last_name, "[{}]"))
dim(x)
[1] 0 8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
x <- d %>% 
  dplyr::filter(stringr::str_detect(first_name, "[{}]"))
dim(x)
[1] 1 8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
COLLINS MARY {HOLLY} HOLLOWELL NA FEMALE 36 REMOVED REMOVED UNDER OLD PURGE LAW
x <- d %>% 
  dplyr::filter(stringr::str_detect(midl_name, "[{}]"))
dim(x)
[1] 4 8
x %>%   
  dplyr::slice_head(n = 100) %>% 
  dplyr::arrange(last_name, first_name, midl_name) %>% 
  knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
FRIZZELL JOHN {ALEX}ANDER NA MALE 99 REMOVED DECEASED
LILLEY MRS G C {MARJORIE NA FEMALE 77 REMOVED DECEASED
SHIPLEY JAMES NA MALE 48 REMOVED REMOVED UNDER OLD PURGE LAW
WHITE MARYA E {P H } NA FEMALE 110 REMOVED DECEASED
  • 5 names with braces
  • Some appear to be nicknames, e.g. MARY {HOLLY}, {ALEX}ANDER
  • Some might be OCR errors, e.g. D}@IS = DENIS

Map braces to empty string

3.6.4.16 Check for other characters

d %>% 
  dplyr::select(last_name) %>%
  dplyr::filter(stringr::str_detect(last_name, "[^-a-zA-Z0-9/_%\'\"\\\\*\`~ \\.,\\\\(){}]"))
# A tibble: 8 x 1
  last_name        
  <chr>            
1 ;PGEMAN          
2 O;NEAL           
3 RIDGWAY;         
4 O=BOZOVICH       
5 MOSELY]          
6 BREED;PVE        
7 CPP[ER           
8 CHAVIES & CHAVIES
d %>% 
  dplyr::select(first_name) %>%
  dplyr::filter(stringr::str_detect(first_name, "[^-a-zA-Z0-9/_%\'\"\\\\*\`~ \\.,\\\\(){}]"))
# A tibble: 8 x 1
  first_name   
  <chr>        
1 MERLE  ATTN!!
2 JOSEPH#      
3 FRED#        
4 STAN;EY      
5 MICHAE;      
6 JOSEPH]      
7 E;OZABETH    
8 RORY]        
d %>% 
  dplyr::select(midl_name) %>%
  dplyr::filter(stringr::str_detect(midl_name, "[^-a-zA-Z0-9/_%\'\"\\\\*\`~ \\.,\\\\(){}]"))
# A tibble: 15 x 1
   midl_name   
   <chr>       
 1 [ DAVID ] FI
 2 L!!!hold for
 3 (DECEASED ??
 4 ]           
 5 D}@IS       
 6 ;           
 7 KIYAUM]     
 8 GEAN]       
 9 T!          
10 PAU;        
11 PAU;        
12 ]ANN        
13 MAR;E       
14 JAYNE]      
15 G^          
  • 31 names with other characters
  • Some appear to substitute for single quote, e.g. O;NEAL, O=BOZOVICH
  • Some might be OCR errors, e.g. STAN;EY, E;OZABETH
  • Some are explanatory notes, e.g. “MERLE ATTN!!”, “L!!!hold for”, "“(DECEASED ??”
  • Some appear to be random junk, e.g. FRED#, RORY], “;”, “G^”

Map other characters to empty string

3.6.4.17 Character summary

Map non-alphanumeric characters to empty string

3.6.5 Words

Look for words that shouldn’t be in names.

3.6.5.1 name_sufx_cd

name_sufx_cd: Voter name suffix

I am not going to use name suffix in entity resolution because age should be sufficient and is much better quality.

Just look at what turns up in the name suffix in order to see what occurs, so that the same values can be removed from the other name fields where they shouldn’t occur but do.

d %>% dplyr::select(name_sufx_cd) %>% skimr::skim()
Table 3.2: Data summary
Name Piped data
Number of rows 8003293
Number of columns 1
_______________________
Column type frequency:
character 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
name_sufx_cd 7561920 0.06 1 3 0 222 0
table(d$name_sufx_cd, useNA = "ifany")

      ?       '     (GE     (II     (JR     (SR      \\       `       0     040 
      2       2       1       1       4       1       2      20       3       1 
    070     072      08       1     106      11     111     134      15     181 
      1       1       1       7       1     101     241       1       1       1 
     1V       2     2ND       3     346      39     3RD       5     5TH      77 
      5       4       1       1       1       1      14       1       2       1 
      8     8TH       9       A     AJR     AKB     ALB     ALM     ANN     ARK 
      1       1       1       1       1       1       1       1       6       1 
    ART     ARV       B     BAL     BAS     BAU     BEA     BEL     BEN     BOU 
      1       1       6       1       1       1       2       1       1       1 
    BRA     BRI     BRO     BUC     BUN       C      C.     CAM     CHA     CLA 
      1       3       1       1       1      10       1       1       1       1 
    COY     CRA     CUB     CUM     CUT       D     DAN     DAV     DIC     DIG 
      2       1       1       1       1       6       3       1       1       1 
     DO     DOR     DOU     DOV      DR     DR.       E     EDW     ELE     ELI 
      3       1       1       1       1       4       5       1       1       1 
    ELS     ETT     EWA      EY       F     F M     FAU     FOR     FRE       G 
      1       1       1       1       7       1       1       2       2       4 
    GLE     GRE     GUY       H     HAM     HIL     HOG     HOO     HUS       I 
      1       1       1       3       1       1       1       2       1     566 
     II     II.     III     IIL     ILI      IN     ING     IRM     ITH      IV 
  26023       3   56928       1       1       2       1       1       1    6955 
    IV.      IX       J     JAC     JAM      JD     JEN     JOH     JON     JOS 
      2       1      17       1       1       4       1       1       1       2 
     jr      JR     JR,     Jr.     JR.       K     KAP     KEN     KIN     KIT 
      1  295262       1       2    2832       4       1       1       1       1 
      L     LAR     LEE     LEN     LES     LEW     LIN      LL     LLL     LOC 
      8       1       2       1       1       1       1       3       2       1 
    LOU     LYN       M     M D     MAC     MAE     MAT     MCK     MCQ     MCR 
      2       1      11       1       1       1       1       1       1       1 
     MD     MMO     MOO     MOR      MR     MR.     MRS      MS     MS.     MUR 
      6       1       1       1      11      17     123       6      18       1 
      N     NGT     NOC     NON     NOR      NS       O     O'S      OD     OLI 
      3       1       1       1       1       1       2       1       2       1 
     ON     ONG      OV       P     PAU     PET     PHE     PIL     PLA     POP 
      1       1       1       2       1       1       1       1       1       1 
      Q       R     RAY     REB     REE     REV     ROB     ROD     ROY       S 
      3      10       1       1       1      10       2       1       1       5 
    SAM     SCO     SMI     SOR      sr      SR     Sr.     SR.     STA     STE 
      1       2       1       1       1   50917       3     562       2       1 
    SUE     SUM     SWA       T      TA     TOB     TWA     UNK       V     VAN 
      1       1       1       2       1       1       1       1     345       1 
    VER      VI     VII     VIR     VOS       W     WAL     WAR     WIL     WOL 
      1      44      14       1       1       7       1       1       2       1 
      X       Y    <NA> 
      1       1 7561920 
# get a better look at the cleaned suffixes
d %>% 
  dplyr::mutate(
    sufx = name_sufx_cd %>% 
      stringr::str_to_upper() %>% 
      stringr::str_remove_all(pattern = "[^A-Z0-9]") %>% # remove non-alphanumeric
      dplyr::na_if("") 
  ) %>% 
  dplyr::count(sufx) %>% 
  dplyr::arrange(desc(n), sufx) %>% 
  knitr::kable()
sufx n
NA 7561946
JR 298102
III 56928
SR 51484
II 26027
IV 6957
I 566
V 345
111 241
MRS 123
11 101
VI 44
MR 28
MS 24
J 17
3RD 14
VII 14
C 11
M 11
R 10
REV 10
L 8
1 7
F 7
MD 7
W 7
ANN 6
B 6
D 6
1V 5
DR 5
E 5
S 5
2 4
G 4
JD 4
K 4
0 3
BRI 3
DAN 3
DO 3
H 3
LL 3
N 3
Q 3
5TH 2
BEA 2
COY 2
FOR 2
FRE 2
HOO 2
IN 2
JOS 2
LEE 2
LLL 2
LOU 2
O 2
OD 2
P 2
ROB 2
SCO 2
STA 2
T 2
WIL 2
040 1
070 1
072 1
08 1
106 1
134 1
15 1
181 1
2ND 1
3 1
346 1
39 1
5 1
77 1
8 1
8TH 1
9 1
A 1
AJR 1
AKB 1
ALB 1
ALM 1
ARK 1
ART 1
ARV 1
BAL 1
BAS 1
BAU 1
BEL 1
BEN 1
BOU 1
BRA 1
BRO 1
BUC 1
BUN 1
CAM 1
CHA 1
CLA 1
CRA 1
CUB 1
CUM 1
CUT 1
DAV 1
DIC 1
DIG 1
DOR 1
DOU 1
DOV 1
EDW 1
ELE 1
ELI 1
ELS 1
ETT 1
EWA 1
EY 1
FAU 1
FM 1
GE 1
GLE 1
GRE 1
GUY 1
HAM 1
HIL 1
HOG 1
HUS 1
IIL 1
ILI 1
ING 1
IRM 1
ITH 1
IX 1
JAC 1
JAM 1
JEN 1
JOH 1
JON 1
KAP 1
KEN 1
KIN 1
KIT 1
LAR 1
LEN 1
LES 1
LEW 1
LIN 1
LOC 1
LYN 1
MAC 1
MAE 1
MAT 1
MCK 1
MCQ 1
MCR 1
MMO 1
MOO 1
MOR 1
MUR 1
NGT 1
NOC 1
NON 1
NOR 1
NS 1
OLI 1
ON 1
ONG 1
OS 1
OV 1
PAU 1
PET 1
PHE 1
PIL 1
PLA 1
POP 1
RAY 1
REB 1
REE 1
ROD 1
ROY 1
SAM 1
SMI 1
SOR 1
STE 1
SUE 1
SUM 1
SWA 1
TA 1
TOB 1
TWA 1
UNK 1
VAN 1
VER 1
VIR 1
VOS 1
WAL 1
WAR 1
WOL 1
X 1
Y 1
  • There are honorific titles: MRS, MR, MS, REV, MD, DR, JD
  • There are generation suffixes: JR, SR, I (1, J, L), II (2, 2ND, 11, LL), III (3RD, 111, IIL, ILI, LLL), IV, V (5TH), VI, VII, 8TH

3.6.5.2 Honorific in name

Look for honorifics that have been put in name fields.

# last name
hons <- c(
  "MR", "MISTER", "MASTER", "MRS", "MS", "MISS", 
  "REV", "REVEREND", "SR", "SISTER", "BR", "BROTHER",
  "DR", "DOCTOR", "MD", "JD", "PROF", "PROFESSOR"
  ) %>% 
  glue::glue(x = . , "\\b{x}\\b") %>%  # honorifics must be words
  glue::glue_collapse(sep = "|") %>% 
  glue::glue(x = . , "({x})")

x <- d %>% 
  dplyr::filter(
    last_name %>% 
      stringr::str_to_upper() %>% 
      stringr::str_remove_all(pattern = "[^ A-Z]") %>% 
      stringr::str_squish() %>% 
      stringr::str_detect(pattern = hons)
  ) %>% 
  dplyr::arrange(last_name, sex, first_name)
nrow(x)
[1] 149
x %>% knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
AULSEYBROOK SR NORMAN D SR MALE 97 REMOVED REMOVED UNDER OLD PURGE LAW
BARRINGER MD PHIL LOUIS NA MALE 89 REMOVED DECEASED
BRAKE SR ESS CAROLYN G NA FEMALE 50 ACTIVE VERIFIED
BROTHER ANN MARIE NA FEMALE 30 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BROTHER ANN MARIE D NA FEMALE 30 REMOVED DUPLICATE
BROTHER SHERRY DEBORAH NA FEMALE 55 ACTIVE VERIFIED
BROTHER HASSAND OMAR NA MALE 28 ACTIVE VERIFIED
BROTHER SCOTT MICHAEL NA MALE 33 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
CARTER MD JOEY M NA MALE 67 REMOVED REMOVED UNDER OLD PURGE LAW
COLLINS (SISTER) M GRETA NA FEMALE 86 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DOCTOR ADRIENNE N NA FEMALE 28 REMOVED MOVED FROM COUNTY
DOCTOR ADRIENNE NAKIA NA FEMALE 28 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DOCTOR ANN Z NA FEMALE 28 ACTIVE VERIFIED
DOCTOR BLANCHE NA NA FEMALE 25 ACTIVE VERIFIED
DOCTOR CYNTHIA WHITTED NA FEMALE 36 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCTOR DIANE WINFREE NA FEMALE 42 ACTIVE VERIFIED
DOCTOR ESTHER WILLETTE NA FEMALE 57 ACTIVE VERIFIED
DOCTOR FRANKSENE HOUSTON NA FEMALE 60 ACTIVE VERIFICATION PENDING
DOCTOR IRIS DAVIS NA FEMALE 62 REMOVED MOVED FROM COUNTY
DOCTOR IRIS DAVIS NA FEMALE 62 REMOVED MOVED FROM COUNTY
DOCTOR IRIS JANE NA FEMALE 62 ACTIVE VERIFIED
DOCTOR JOANNE MULLEN NA FEMALE 25 ACTIVE VERIFIED
DOCTOR KATHY HAMILTON NA FEMALE 31 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCTOR LATONYA E NA FEMALE 28 INACTIVE CONFIRMATION NOT RETURNED
DOCTOR LETICIA YVETTE NA FEMALE 29 REMOVED MOVED FROM COUNTY
DOCTOR LETICIA Y NA FEMALE 29 ACTIVE VERIFIED
DOCTOR LOUISE WHITFIELD NA FEMALE 65 ACTIVE VERIFIED
DOCTOR MARIE LORRAINE NA FEMALE 35 ACTIVE VERIFICATION PENDING
DOCTOR MARY DURKEE NA FEMALE 49 ACTIVE LEGACY DATA
DOCTOR MELISSA A NA FEMALE 27 ACTIVE VERIFIED
DOCTOR MONICO MOORE NA FEMALE 40 REMOVED MOVED FROM COUNTY
DOCTOR MONICO RENE NA FEMALE 40 ACTIVE VERIFIED
DOCTOR MONIKE NA NA FEMALE 35 ACTIVE VERIFIED
DOCTOR PARISTEEN HARRINGTON NA FEMALE 74 ACTIVE VERIFIED
DOCTOR PORTIA R NA FEMALE 28 REMOVED FELONY CONVICTION
DOCTOR PORTIA REVON NA FEMALE 28 REMOVED FELONY CONVICTION
DOCTOR ROBIN W NA FEMALE 45 REMOVED MOVED FROM COUNTY
DOCTOR SARAH STUART NA FEMALE 81 ACTIVE VERIFIED
DOCTOR SARAH MARIE NA FEMALE 79 ACTIVE VERIFIED
DOCTOR SUE NA NA FEMALE 89 REMOVED DECEASED
DOCTOR SUSAN ELLEN NA FEMALE 46 REMOVED REMOVED UNDER OLD PURGE LAW
DOCTOR SUSAN ELLEN NA FEMALE 46 REMOVED MOVED FROM COUNTY
DOCTOR TANNEH TEAH NA FEMALE 28 ACTIVE VERIFIED
DOCTOR VERNETHA OMEGA NA FEMALE 27 ACTIVE VERIFIED
DOCTOR ALEXANDER K NA MALE 28 INACTIVE CONFIRMATION NOT RETURNED
DOCTOR ALFRED NA JR MALE 38 INACTIVE CONFIRMATION NOT RETURNED
DOCTOR ALPHONSE NA NA MALE 70 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCTOR CLIFFORD GARY NA MALE 45 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCTOR CLIFFORD JEROME NA MALE 26 ACTIVE VERIFIED
DOCTOR DANIEL L NA MALE 40 REMOVED MOVED FROM STATE
DOCTOR DANIEL L NA MALE 40 ACTIVE VERIFIED
DOCTOR DONALD NA NA MALE 54 ACTIVE VERIFIED
DOCTOR DONALD LYNN NA MALE 49 ACTIVE LEGACY DATA
DOCTOR FANNIE W NA MALE 81 ACTIVE VERIFIED
DOCTOR GLENN ANTOINE NA MALE 25 ACTIVE VERIFIED
DOCTOR HENRY NA NA MALE 87 ACTIVE VERIFIED
DOCTOR JASON ALEXANDER NA MALE 27 REMOVED MOVED FROM STATE
DOCTOR JASON NA NA MALE 25 ACTIVE VERIFIED
DOCTOR JEFFREY JAMES NA MALE 33 DENIED VERIFICATION RETURNED UNDELIVERABLE
DOCTOR JOHNNY LEWIS SR MALE 72 ACTIVE VERIFIED
DOCTOR KENNETH RAY NA MALE 46 ACTIVE VERIFIED
DOCTOR LOUIS NA NA MALE 88 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DOCTOR RICHARD NA III MALE 89 ACTIVE VERIFIED
DOCTOR ROBERT B NA MALE 44 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
DOCTOR TERRENCE GERORD NA MALE 19 ACTIVE VERIFIED
DOCTOR TONY MELVIN SR MALE 51 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DOCTOR TONY M NA MALE 53 ACTIVE VERIFIED
DOCTOR TRYELLE TRIAWAN NA MALE 22 ACTIVE VERIFIED
DOSS SR MICHAEL RAY NA MALE 45 ACTIVE VERIFIED
DR HENIETTA NA NA FEMALE 47 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
FEE (SISTER) HELENE NA NA FEMALE 67 REMOVED MOVED FROM COUNTY
HICKS SR WILFORD LYTLE SR. MALE 87 ACTIVE VERIFIED
HOWELL SR BILL ZIP NA MALE 80 REMOVED DECEASED
HUMPHREY SR DAVID EVANDER NA MALE 76 ACTIVE VERIFICATION PENDING
LA MASTER CYNTHIA TREADWELL NA FEMALE 46 REMOVED MOVED FROM COUNTY
LA MASTER FRANKLIN THOMAS NA MALE 50 REMOVED MOVED FROM COUNTY
LE MASTER YOLANDA SHONTA NA FEMALE 34 ACTIVE VERIFIED
LEE SR LAUCHLIN MCKINNON NA MALE 71 REMOVED MOVED FROM COUNTY
MAC MASTER GEORGIA PALIKARAS NA FEMALE 47 ACTIVE VERIFIED
MASTER BEVERLYN MCLEOD NA FEMALE 54 ACTIVE CONFIRMATION PENDING
MASTER KAREN LEONARD NA FEMALE 43 REMOVED MOVED FROM STATE
MASTER KAREN ELISE NA FEMALE 26 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MASTER MARCIA FROULA NA FEMALE 52 ACTIVE LEGACY DATA
MASTER MARY K NA FEMALE 44 ACTIVE VERIFICATION PENDING
MASTER MAUREEN NA NA FEMALE 46 REMOVED REMOVED UNDER OLD PURGE LAW
MASTER MAUREEN R NA FEMALE 70 ACTIVE VERIFIED
MASTER MELISSA ANNE NA FEMALE 39 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MASTER SHELIA MARIE NA FEMALE 33 ACTIVE VERIFIED
MASTER STEPHANIE L NA FEMALE 42 REMOVED MOVED FROM COUNTY
MASTER SUSAN DOROTHY NA FEMALE 33 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MASTER AMIR NA NA MALE 49 REMOVED MOVED FROM COUNTY
MASTER BARRY LEWIS NA MALE 54 ACTIVE VERIFIED
MASTER EDWARD FRANCIS NA MALE 42 REMOVED REMOVED UNDER OLD PURGE LAW
MASTER EDWARD J JR MALE 74 ACTIVE VERIFIED
MASTER MARK WAYNE NA MALE 38 REMOVED REQUEST FROM VOTER
MASTER RONALD EARL NA MALE 50 REMOVED MOVED FROM STATE
MCCLAIN (SISTER) M MILDRED NA FEMALE 77 REMOVED MOVED FROM COUNTY
MCDONOUGH (SISTER) M BERNITA NA FEMALE 90 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MISS BRANDI ALEESA NA FEMALE 29 REMOVED MOVED FROM COUNTY
MISS BRANDI A NA FEMALE 29 ACTIVE VERIFIED
MISS BRANDI ALEESA NA FEMALE 29 REMOVED MOVED FROM COUNTY
MISS CONNIE SHRIVER NA FEMALE 49 ACTIVE VERIFIED
MISS BENJAMIN THOMAS NA MALE 22 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MISS ROBERT EDWARD NA MALE 68 REMOVED REMOVED UNDER OLD PURGE LAW
MISS STEPHEN P NA MALE 38 REMOVED REMOVED UNDER OLD PURGE LAW
MISS STEPHEN PATRICK NA MALE 38 ACTIVE VERIFIED
MISS THOMAS CHARLES NA MALE 56 ACTIVE VERIFIED
MISTER CHARLENE NOYES NA FEMALE 60 ACTIVE VERIFIED
MISTER EBONY N NA FEMALE 23 ACTIVE VERIFIED
MISTER EDNA ELIZAB SPENCE NA FEMALE 59 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MISTER MARTHA HIGGS NA FEMALE 78 REMOVED MOVED FROM COUNTY
MISTER MARTHA HIGGS NA FEMALE 78 ACTIVE UNVERIFIED
MISTER MARTHA HIGGS NA FEMALE 78 REMOVED MOVED FROM COUNTY
MISTER MARTHA HIGGS NA FEMALE 78 REMOVED DUPLICATE
MISTER MELISSA MARIA NA FEMALE 35 REMOVED MOVED FROM STATE
MISTER MELISSA MARIA NA FEMALE 35 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MISTER NORMA L NA FEMALE 83 REMOVED DECEASED
MISTER PAMELA JEAN NA FEMALE 52 ACTIVE VERIFIED
MISTER ROCHELLA NA NA FEMALE 34 ACTIVE VERIFIED
MISTER RUBY JOHNSON NA FEMALE 87 ACTIVE VERIFIED
MISTER SONYA ROBIN NA FEMALE 40 ACTIVE VERIFIED
MISTER STASIA MAE NA FEMALE 35 ACTIVE VERIFIED
MISTER BRYAN WESLEY NA MALE 40 ACTIVE VERIFIED
MISTER BRYAN WESLEY NA MALE 40 REMOVED REMOVED UNDER OLD PURGE LAW
MISTER GILBERT GLENWOOD NA MALE 87 REMOVED DECEASED
MISTER JOHN EDWARD NA MALE 60 ACTIVE VERIFIED
MISTER JOHN EDWARD NA MALE 60 REMOVED MOVED FROM COUNTY
MISTER LARRY D NA MALE 47 ACTIVE VERIFIED
MISTER LONNIE THOMAS NA MALE 62 REMOVED MOVED FROM STATE
MISTER LONNIE T NA MALE 41 INACTIVE CONFIRMATION NOT RETURNED
MISTER MICHAEL LEE NA MALE 32 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
MISTER MICHAEL LEE NA MALE 32 ACTIVE VERIFIED
MISTER THOMAS COLLIER NA MALE 84 ACTIVE VERIFIED
MISTER WESLEY ALLEN NA MALE 27 REMOVED MOVED FROM COUNTY
MISTER WESLEY A NA MALE 27 ACTIVE VERIFIED
MR FEWEL THOMAS WALLACE NA MALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
MURPHY DR JAMES JOSEPH NA MALE 67 REMOVED REMOVED UNDER OLD PURGE LAW
PERROTT SR JOHN WILLIAM NA MALE 87 REMOVED REMOVED UNDER OLD PURGE LAW
PROFFITT SR BILLY EUGENE NA MALE 69 ACTIVE LEGACY DATA
PUTNAM SR EDWARD LIONEL NA MALE 79 REMOVED DECEASED
SMITH MD PATRICIA ANN NA FEMALE 40 ACTIVE VERIFIED
STIMSON SR RICHARD BARRETT NA MALE 60 ACTIVE VERIFIED
THORSON SR LLOYD EDWARD NA MALE 71 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
TRUETT SR TIMOTHY J NA MALE 38 REMOVED FELONY CONVICTION
TYLER SR KENNETH AARON NA MALE 39 REMOVED REMOVED UNDER OLD PURGE LAW
VAUGHN SR WALTER S NA MALE 78 ACTIVE VERIFIED
WHITWORTH SR RANDY SEAN NA MALE 30 ACTIVE VERIFIED
WILKIE SR WILLIAM HOYT NA MALE 86 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
WILLIAMSON DR IRVIN D NA MALE 43 ACTIVE VERIFIED

Last name

  • BROTHER, CAPTAIN, COLONEL, DOCTOR, MASTER, MISS, MISTER appear to be legitimate last names
  • MR is the only prefix: MR FEWEL
  • SR, MD, SISTER, DR are suffixes
# first name
hons <- c(
  "MR", "MISTER", "MASTER", "MRS", "MS", "MISS", 
  "REV", "REVEREND", "SR", "SISTER", "BR", "BROTHER",
  "DR", "DOCTOR", "MD", "JD", "PROF", "PROFESSOR"
  ) %>% 
  glue::glue(x = . , "\\b{x}\\b") %>%  # honorifics must be words
  glue::glue_collapse(sep = "|") %>% 
  glue::glue(x = . , "({x})")

x <- d %>% 
  dplyr::filter(
    first_name %>% 
      stringr::str_to_upper() %>% 
      stringr::str_remove_all(pattern = "[^ A-Z]") %>% 
      stringr::str_detect(pattern = hons)
  ) %>% 
  dplyr::arrange(first_name, sex, last_name)
nrow(x)
[1] 252
x %>% knitr::kable()
last_name first_name midl_name name_sufx_cd sex age voter_status_desc voter_status_reason_desc
WALKER (MRS) LOLA M NA FEMALE 112 REMOVED REMOVED UNDER OLD PURGE LAW
RICE (REV) CALVIN SHIRLEY NA MALE 79 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ESTES ALMA MRS A NA FEMALE 82 ACTIVE VERIFIED
TILLEY ARNOLD MRS NA NA FEMALE 77 REMOVED MOVED FROM COUNTY
CROMER BETTY MRS A NA FEMALE 78 ACTIVE VERIFIED
SCALES BETTY MRS H NA FEMALE 69 ACTIVE VERIFIED
PEACEMAKER BROTHER NA NA MALE 59 ACTIVE VERIFIED
ASHE DOCTOR NA NA FEMALE 29 ACTIVE VERIFIED
AAL-ANUBIA DOCTOR O NA MALE 57 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
AAL-ANUBIA DOCTOR M NA MALE 36 ACTIVE VERIFIED
AAL-ANUBIAIMHO DOCTOR M NA MALE 36 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
AAL-ANUBIAIMHOTE DOCTOR K NA MALE 38 ACTIVE VERIFIED
AALANUBIAIMHOTEPOKOR DOCTOR M NA MALE 36 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ALSTON DOCTOR AMOS NA MALE 67 ACTIVE VERIFIED
ATHAY DOCTOR WEBB NA MALE 43 ACTIVE LEGACY DATA
BAKER DOCTOR CLAUDE NA MALE 95 REMOVED REMOVED UNDER OLD PURGE LAW
BEASLEY DOCTOR R NA MALE 97 REMOVED DECEASED
BOWEN DOCTOR GLENN JR MALE 68 ACTIVE VERIFIED
BRICE DOCTOR WARREN NA MALE 94 REMOVED REMOVED UNDER OLD PURGE LAW
BROWN DOCTOR THURMAN NA MALE 101 REMOVED DECEASED
BULLOCK DOCTOR GEORGE NA MALE 105 REMOVED DECEASED
CARPENTER DOCTOR LOYDE NA MALE 88 REMOVED DECEASED
CLAYTON DOCTOR CICRO NA MALE 88 REMOVED DECEASED
EVANS DOCTOR NA JR MALE 58 ACTIVE VERIFIED
EWING DOCTOR BUISE NA MALE 79 ACTIVE VERIFIED
FIELDS DOCTOR ARNOLD NA MALE 70 ACTIVE VERIFIED
FORSYTHE DOCTOR LOUIS NA MALE 84 ACTIVE VERIFIED
FRANKLIN DOCTOR BENJAMIN NA MALE 80 ACTIVE VERIFIED
FRAZIER DOCTOR BUCK NA MALE 70 ACTIVE LEGACY DATA
GOWER DOCTOR HUBERT NA MALE 81 ACTIVE LEGACY DATA
HAYES DOCTOR DANIEL NA MALE 74 REMOVED DECEASED
HINSON DOCTOR SLADE NA MALE 86 REMOVED DECEASED
HOLLAND DOCTOR RALPH NA MALE 73 REMOVED DECEASED
HUMPHREY DOCTOR JEROME NA MALE 75 ACTIVE LEGACY DATA
HUSSEY DOCTOR L NA MALE 74 ACTIVE VERIFIED
JEFFERSON DOCTOR JAMES NA MALE 73 ACTIVE VERIFIED
JONES DOCTOR BRUCE JR MALE 54 ACTIVE VERIFIED
LEONARD DOCTOR MARK NA MALE 46 REMOVED REMOVED UNDER OLD PURGE LAW
MCCULLOCH DOCTOR W NA MALE 69 REMOVED DECEASED
MCDANIEL DOCTOR C NA MALE 87 REMOVED DECEASED
PHIPPS DOCTOR CONLEY NA MALE 83 ACTIVE VERIFIED
PRUETTE DOCTOR MAX JR MALE 61 ACTIVE LEGACY DATA
RABON DOCTOR RICHARD NA MALE 0 ACTIVE LEGACY DATA
RUDD DOCTOR FRANKLIN JR MALE 78 ACTIVE VERIFIED
SALAAM DOCTOR ABDULLAH NA MALE 52 ACTIVE VERIFIED
SHIVER DOCTOR ELLIS JR MALE 45 ACTIVE VERIFIED
SMART DOCTOR NORRIS NA MALE 0 REMOVED MOVED FROM COUNTY
SPAULDING DOCTOR F NA MALE 0 REMOVED DECEASED
STEVENS DOCTOR JOHN NA MALE 39 ACTIVE VERIFIED
STEVENS DOCTOR J NA MALE 39 REMOVED MOVED FROM COUNTY
WARD DOCTOR ERNEST NA MALE 84 ACTIVE VERIFIED
WATERS DOCTOR TOMMIE NA MALE 76 REMOVED DECEASED
WEBB DOCTOR B NA MALE 204 REMOVED DECEASED
WILLIAMS DOCTOR FRANKLIN NA MALE 85 REMOVED DECEASED
NICHOLS DORIS ( MRS W NA NA FEMALE 92 ACTIVE VERIFIED
HEARN DR JOHN WILLOUG NA MALE 80 ACTIVE LEGACY DATA
MAYS DR DAVID NA MALE 59 ACTIVE LEGACY DATA
MOORE DR H W NA MALE 99 INACTIVE CONFIRMATION NOT RETURNED
AAL-ANUBIAIMHOTE DR NGOZI NA NA FEMALE 55 ACTIVE VERIFIED
SGRO DR. BEVERLY HUTSON NA FEMALE 64 ACTIVE LEGACY DATA
HINES E L - MRS NA NA FEMALE 93 REMOVED DECEASED
PENUEL EDGAR - MRS E NA FEMALE 89 REMOVED DECEASED
GOOLSBY EUGENE MRS NA NA FEMALE 79 ACTIVE VERIFIED
HARTIS FRANK E MRS THAMES NA FEMALE 77 ACTIVE VERIFIED
BARBER GEORGE SR B NA MALE 76 REMOVED DECEASED
BINGMAN GRAY MRS NA NA FEMALE 68 ACTIVE VERIFIED
GIBSON H MRS L NA FEMALE 78 ACTIVE VERIFIED
ROBINSON HARVEY MRS W NA FEMALE 0 REMOVED DECEASED
MCGILL ISAIAH SR NA NA MALE 80 REMOVED DECEASED
NEWTON J R - MRS NA NA FEMALE 105 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BAKER J.D. NA NA MALE 49 ACTIVE LEGACY DATA
HARRIS J.D. NA NA MALE 80 REMOVED ADMINISTRATIVE
HAYES J.D. NA NA MALE 81 REMOVED ADMINISTRATIVE
LEATHERMAN J.D. ANDREW NA MALE 77 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
THOMPSON J.D. NA NA MALE 78 REMOVED REMOVED UNDER OLD PURGE LAW
PROCTOR J.D., NA JR FEMALE 76 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
MASSAGEE JAMES H MRS SUE NA FEMALE 71 ACTIVE VERIFIED
FULP JAMES MRS C NA FEMALE 78 ACTIVE VERIFIED
MARTIN JAMES MRS H NA FEMALE 70 ACTIVE VERIFIED
TRULL JAMES MRS T NA FEMALE 75 ACTIVE VERIFIED
BREWINGTON JD D NA MALE 72 ACTIVE VERIFIED
BROWN JD NA JR MALE 68 ACTIVE VERIFIED
CLINE JD NA NA MALE 61 REMOVED DECEASED
DOUGLAS JD NA NA MALE 61 ACTIVE VERIFICATION PENDING
FAIR JD FAIR NA MALE 81 ACTIVE VERIFIED
GREEN JD WILLIAM NA MALE 27 ACTIVE VERIFIED
HERRING JD NA SR MALE 62 ACTIVE VERIFIED
HUNT JD D NA MALE 60 ACTIVE VERIFIED
ISAACS JD BOBBY NA MALE 81 ACTIVE LEGACY DATA
MILES JD NA NA MALE 65 ACTIVE VERIFIED
PAINTER JD NA NA MALE 71 REMOVED DECEASED
PRUITT JD Sr NA MALE 71 REMOVED FELONY CONVICTION
PRUITT JD NA JR MALE 48 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
QUEEN JD NA NA MALE 65 ACTIVE LEGACY DATA
THORNE JD NA NA MALE 83 REMOVED MOVED FROM STATE
VANHORN JD ELLIOTT NA MALE 46 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
VANHORN JD ELLIOTT JR MALE 26 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
WILLIAMS JD WESLEY NA MALE 71 ACTIVE LEGACY DATA
PADGETT JOE (DR) C. NA MALE 81 REMOVED DECEASED
WHITE JOE MRS MRS NA FEMALE 86 ACTIVE VERIFIED
LANGSTON JOHN - MRS F JR FEMALE 93 REMOVED DECEASED
GURGANIOUS JOHN MRS HALLIE NA FEMALE 85 ACTIVE VERIFIED
HAMRICK JOHN R MRS MARGARET NA FEMALE 84 ACTIVE VERIFIED
DUNTON JULIAN SR NA NA MALE 0 ACTIVE VERIFIED
WARD MARVIN MRS M NA FEMALE 89 ACTIVE VERIFIED
ALLAH MASTER SAYYID CEE’I NA MALE 29 ACTIVE VERIFIED
BLANKS MASTER R NA MALE 49 ACTIVE VERIFIED
BOND MASTER GEE NA MALE 24 REMOVED FELONY CONVICTION
BOND MASTER GEE NA MALE 24 REMOVED FELONY CONVICTION
BROWDER MASTER PAUL NA MALE 21 ACTIVE VERIFIED
LEGGETT MASTER KARRIEM NA MALE 30 REMOVED FELONY CONVICTION
MCGUIRE MASTER MIKKEL BRYANT NA MALE 51 ACTIVE VERIFIED
PATE MASTER BOWCIVIS NA MALE 36 ACTIVE VERIFIED
AKBOR MD S NA MALE 28 ACTIVE VERIFICATION PENDING
STOCKELL MD COOPER III MALE 50 ACTIVE VERIFIED
ANGKANA MISS NA NA FEMALE 30 REMOVED REQUEST FROM VOTER
ARIEL MISS NA NA FEMALE 21 ACTIVE CONFIRMATION PENDING
HALL MISS EDNA ESTELLE NA FEMALE 108 REMOVED REMOVED UNDER OLD PURGE LAW
LEE MISS VIRGINIA SHA NA FEMALE 61 REMOVED REMOVED UNDER OLD PURGE LAW
NWEMBYA MISS MUADI NA FEMALE 29 INACTIVE CONFIRMATION NOT RETURNED
SPEIGHT MISS STEPHANI RENEE’ NA FEMALE 31 ACTIVE VERIFIED
CARTER MISTER MALCOLM NA MALE 25 ACTIVE VERIFIED
LUTHER MISTER WILSON NA MALE 31 ACTIVE VERIFIED
MCNEELY MISTER SECREST NA MALE 55 ACTIVE VERIFIED
MILLER MISTER C NA MALE 26 ACTIVE CONFIRMATION PENDING
PATE MISTER ANGELO NA MALE 38 ACTIVE VERIFIED
PATTON MISTER W NA MALE 31 ACTIVE VERIFICATION PENDING
PHILLIPS MISTER WAHKING NA MALE 27 ACTIVE VERIFIED
RABY MISTER HOLLY NA MALE 24 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
ROGERS MISTER MASIO NA MALE 32 DENIED VERIFICATION RETURNED UNDELIVERABLE
ACKBAR MR NA NA MALE 55 INACTIVE CONFIRMATION NOT RETURNED
FATE MR NA NA MALE 43 ACTIVE VERIFIED
KANE MR NA NA MALE 34 ACTIVE VERIFIED
KEVIN MR NA NA MALE 41 INACTIVE CONFIRMATION NOT RETURNED
BENNETT MRS PERCIVAL R FEMALE 97 REMOVED DECEASED
CARROLL MRS MABLE P NA FEMALE 104 REMOVED REMOVED UNDER OLD PURGE LAW
CARSON MRS ANNIE GREENE NA FEMALE 62 REMOVED REMOVED UNDER OLD PURGE LAW
CATES MRS CALLIE NA FEMALE 110 REMOVED REMOVED UNDER OLD PURGE LAW
COLEY MRS N NA FEMALE 204 INACTIVE CONFIRMATION NOT RETURNED
COOK MRS JOHN NA FEMALE 98 REMOVED DECEASED
DILLARD MRS NANCY L NA FEMALE 69 REMOVED REMOVED UNDER OLD PURGE LAW
DLOPFER MRS MARTHA S NA FEMALE 70 REMOVED REMOVED UNDER OLD PURGE LAW
FREELAND MRS HAZEL E NA FEMALE 94 REMOVED REMOVED UNDER OLD PURGE LAW
GARLAND MRS DALLAS JR FEMALE 79 REMOVED DECEASED
GATES MRS ULA PARKER NA FEMALE 79 REMOVED REMOVED UNDER OLD PURGE LAW
GOBBLE MRS RACHEL PAULI NA FEMALE 89 REMOVED REMOVED UNDER OLD PURGE LAW
GURGANUS MRS CHARLES NA FEMALE 84 REMOVED DECEASED
HAYNES MRS BETTY S NA FEMALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
HILL MRS PATTY MAYNAR NA FEMALE 61 REMOVED REMOVED UNDER OLD PURGE LAW
HOMOLA MRS JEAN ROBERTS NA FEMALE 74 REMOVED REMOVED UNDER OLD PURGE LAW
INGRAHAM MRS LEONORE H NA FEMALE 99 REMOVED REMOVED UNDER OLD PURGE LAW
JOHNSON MRS NAOMI SCURLO NA FEMALE 95 REMOVED REMOVED UNDER OLD PURGE LAW
KENT MRS NELLIE MAY NA FEMALE 89 REMOVED REMOVED UNDER OLD PURGE LAW
KLOPFER MRS EDITH B NA FEMALE 109 REMOVED REMOVED UNDER OLD PURGE LAW
KNIGHT MRS CHERRIE MOOR NA FEMALE 59 REMOVED REMOVED UNDER OLD PURGE LAW
LEE MRS ANNIE PROCTO NA FEMALE 110 REMOVED REMOVED UNDER OLD PURGE LAW
LUU MRS NA NA FEMALE 54 ACTIVE VERIFIED
MANRING MRS ZORA E NA FEMALE 91 REMOVED REMOVED UNDER OLD PURGE LAW
MORRIS MRS EDITH ELLIS NA FEMALE 91 REMOVED REMOVED UNDER OLD PURGE LAW
PICKARD MRS NOVELLA R NA FEMALE 97 REMOVED REMOVED UNDER OLD PURGE LAW
RUSSELL MRS IRA MAE NA FEMALE 85 REMOVED REMOVED UNDER OLD PURGE LAW
SNIPES MRS CARRIE T NA FEMALE 104 REMOVED REMOVED UNDER OLD PURGE LAW
STEWART MRS J K NA FEMALE 100 REMOVED REMOVED UNDER OLD PURGE LAW
TILLERY MRS J T NA FEMALE 94 REMOVED DECEASED
WILKINS MRS CLAIR PICKET NA FEMALE 117 REMOVED REMOVED UNDER OLD PURGE LAW
WILLIAMS MRS BETTY H NA FEMALE 103 REMOVED REMOVED UNDER OLD PURGE LAW
WILSON MRS EDNA P NA FEMALE 86 REMOVED REMOVED UNDER OLD PURGE LAW
WOMACK MRS BROADUS NA FEMALE 100 REMOVED REMOVED UNDER OLD PURGE LAW
KIMREY MRS OLLIE NA MALE 111 REMOVED REMOVED UNDER OLD PURGE LAW
EATON MRS JOHN C NA FEMALE 81 ACTIVE VERIFIED
JEFFERSON MRS ATHOL G NA FEMALE 74 ACTIVE VERIFIED
FIELDS MRS A D NA FEMALE 82 INACTIVE CONFIRMATION NOT RETURNED
BRICKHOUSE MRS CLAUD NA NA FEMALE 93 REMOVED DECEASED
JOHNSON MRS CLYDE W NA FEMALE 74 ACTIVE VERIFIED
MODLIN MRS CLYDE H NA FEMALE 104 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
JOHNSON MRS ERNEST H NA FEMALE 76 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
HARRIS MRS FRED W NA FEMALE 75 ACTIVE VERIFIED
FIELDS MRS G CLINTON NA FEMALE 90 ACTIVE VERIFIED
LILLEY MRS G C {MARJORIE NA FEMALE 77 REMOVED DECEASED
BURKE MRS GEORGE W NA FEMALE 69 ACTIVE VERIFIED
CHATMAN MRS H L NA FEMALE 86 ACTIVE VERIFIED
DAVENPORT MRS H T NA FEMALE 90 ACTIVE VERIFIED
SELBY MRS J D (VIVIAN) NA FEMALE 97 REMOVED DECEASED
FIELDS MRS JAMES C NA FEMALE 84 ACTIVE VERIFIED
COOPER MRS JESSE R NA FEMALE 90 REMOVED REQUEST FROM VOTER
HOLLIDAY MRS JOSEPH NA NA FEMALE 104 ACTIVE VERIFIED
REICH MRS LESTER G NA FEMALE 86 ACTIVE VERIFIED
SPENCE MRS LOUIS ROBERT NA FEMALE 88 REMOVED MOVED FROM COUNTY
RUFF MRS MARTIE M NA FEMALE 105 REMOVED REMOVED UNDER OLD PURGE LAW
STEPPE MRS MAXINE NA NA FEMALE 72 REMOVED DECEASED
POPE MRS O N JR FEMALE 68 ACTIVE VERIFIED
HARRIS MRS P D NA FEMALE 104 ACTIVE VERIFIED
WHITE MRS PAUL B NA FEMALE 0 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
KNIGHT MRS R S ( RUTH ) NA FEMALE 100 REMOVED DECEASED
MORGAN MRS ROY A NA FEMALE 82 ACTIVE VERIFIED
GIBBS MRS THEODORE C. NA FEMALE 84 REMOVED DECEASED
FIELDS MRS W A NA FEMALE 86 REMOVED DECEASED
BATEMAN MRS W E (POLLY) NA FEMALE 93 REMOVED DECEASED
WOODLEY MRS WALLACE ( RUTH ) NA FEMALE 78 ACTIVE VERIFIED
ADAMS MRS WILBERT W NA FEMALE 73 REMOVED DECEASED
RIVES MRS WILBUR A NA FEMALE 71 ACTIVE VERIFIED
MOODY MRS WILLARD W NA FEMALE 98 ACTIVE VERIFIED
BECK MRS WILLIAM E NA FEMALE 79 ACTIVE VERIFIED
HARRIS MRS WILLIAM W NA FEMALE 61 ACTIVE VERIFIED
SMITH MRS WILLIAM JOE DAVIS NA FEMALE 72 ACTIVE VERIFIED
BYRD MRS. TITUS S NA FEMALE 99 REMOVED REMOVED UNDER OLD PURGE LAW
CARTER PAUL MRS NA JR FEMALE 73 ACTIVE VERIFIED
CAMPBELL PROFESSOR JASON NA MALE 21 ACTIVE VERIFIED
SWINSON RALPH - MRS NA NA FEMALE 105 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
BRADLEY RANDOLPH SR NA NA MALE 72 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
DODSON RAY MRS NA NA FEMALE 68 ACTIVE VERIFIED
BRIGGS REV DENNIS NA MALE 47 ACTIVE VERIFICATION PENDING
HULBERT REV IRWIN JR MALE 90 REMOVED DECEASED
MCCLEESE REV. MINNIE NA FEMALE 83 REMOVED DECEASED
FEATHERSTONE REV. ROBERT A NA MALE 83 ACTIVE VERIFIED
RHONEY ROBERT MRS T NA FEMALE 92 ACTIVE VERIFIED
CRESS SISTER DE PORRES NA FEMALE 103 REMOVED REMOVED UNDER OLD PURGE LAW
DASHNER SISTER JULIUS NA FEMALE 64 REMOVED MOVED FROM COUNTY
DOUGHERTY SISTER GERMAINE NA FEMALE 0 REMOVED MOVED FROM COUNTY
DRUDING SISTER MARJORIE NA FEMALE 81 REMOVED MOVED FROM COUNTY
GILDEA SISTER THERESINE NA FEMALE 69 REMOVED MOVED FROM COUNTY
GILDEA SISTER THERESINE NA FEMALE 69 ACTIVE VERIFIED
HENNESSEE SISTER PAUL TERESA NA FEMALE 70 REMOVED MOVED FROM COUNTY
JACOBETTI SISTER MARCELLINA NA FEMALE 0 REMOVED MOVED FROM COUNTY
KALYAN SISTER JULIANNE NA FEMALE 0 REMOVED MOVED FROM COUNTY
KELLY SISTER ANN NA FEMALE 71 ACTIVE VERIFIED
LOWERY SISTER MARY MARK NA FEMALE 75 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
MCNALLY SISTER JEANNE MARGA NA FEMALE 74 REMOVED REMOVED UNDER OLD PURGE LAW
MERKEL SISTER ALOYSIUS NA FEMALE 0 REMOVED MOVED FROM COUNTY
PASK SISTER JUDITH NA FEMALE 63 REMOVED REMOVED UNDER OLD PURGE LAW
PEGUESE SISTER GIRTRUE NA FEMALE 47 ACTIVE VERIFIED
PISKURICH SISTER ANCILLA NA FEMALE 95 REMOVED MOVED FROM COUNTY
ROLF SISTER GEORGE NA FEMALE 64 REMOVED MOVED FROM COUNTY
ROSS SISTER S NA FEMALE 79 ACTIVE VERIFIED
SINCLAIR SISTER PEGEUSE NA FEMALE 47 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
STOPPER SISTER CLARE NA FEMALE 0 REMOVED MOVED FROM COUNTY
SYKES SISTER ANNE MARIE NA FEMALE 64 REMOVED REMOVED UNDER OLD PURGE LAW
TANCRAITOR SISTER MAXINE NA FEMALE 73 REMOVED MOVED FROM COUNTY
TIMPERIO SISTER MARIA GORETT NA FEMALE 76 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
VELCICH SISTER IMELDA NA FEMALE 89 REMOVED DECEASED
VELCICH SISTER IMELDA NA FEMALE 89 REMOVED MOVED FROM COUNTY
MEEHAN SISTER LORETT JOHN NA FEMALE 77 REMOVED REMOVED UNDER OLD PURGE LAW
WELCICH SISTER M IMELD NA FEMALE 89 REMOVED DECEASED
TANCRAITOR SISTER MAXINE ELIZABETH NA FEMALE 73 ACTIVE VERIFIED
KING SR KEVIN NA MALE 43 REMOVED REMOVED AFTER 2 FED GENERAL ELECTIONS IN INACTIVE STATUS
PHILLIPS SR DAYLE KELLEY NA MALE 71 ACTIVE VERIFIED
GRAHAM STEPHEN SR LEGREE NA MALE 60 ACTIVE VERIFIED
MABE STEVE MRS NA NA FEMALE 62 ACTIVE VERIFIED
TIMMONS THOMAS MRS E NA FEMALE 75 ACTIVE VERIFIED
DAVIS W T - MRS NA NA FEMALE 92 INACTIVE CONFIRMATION RETURNED UNDELIVERABLE
LARIMORE WILLIAM MRS NA NA FEMALE 63 ACTIVE VERIFIED
LAMB WILSON MRS C NA FEMALE 82 ACTIVE VERIFIED

First name

  • MRS, appear to be honorifics put in as first names
  • MRS, REV are prefixes
  • SR, MD, SISTER, DR are suffixes

3.6.6 No name

nn, nmn, no name, no middle name

3.6.7 Unknown etc.

unk, unknown, aka, known as, also known as, alias

4 Clean name variables

The aggregated cleaning suggestions are:

Name cleaning suggestions
Issue last_name first_name midl_name Action
Missing 122 254 553,015 Exclude record if first or last name missing
Lower case letters 50 24 169 Map all letters to upper case
Digits 90 81 299 Map digits to empty string if not otherwise mapped
Zero 67 73 130 Map zero to O if name contains at least one letter and no digits 1-9
One 20 3 163
Two 1 1 13
Three 1 0 13
Four 3 1 15
Five 3 0 17
Six 1 1 7
Seven 4 0 8
Eight 0 2 9
Nine 4 0 6
Hyphen 34,325 5,298 6,304 Map hyphen to empty string
Slash 46 9 1,032 Map slash to empty string
Single quote 9,712 1,965 5,426 Map single quote to empty string
Double quote 1 4 19 Map double quote to empty string
Asterisk 7 1 23 Map asterisk to empty string
Back tick 10 71 33 Map back tick to empty string
Tilde 1 0 0 Map tilde to empty string
Underscore 1 17 3 Map underscore to empty string
Percent 1 4 0 Map percent to empty string
Whitespace 13,637 23,789 74,410 Map whitespace to empty string
Period 44 651 9,322 Map period to empty string
Comma 63 51 58 Map comma to empty string
Backslash 4 3 67 Map backslash to empty string
Parentheses 22 105 2,107 Map parentheses to empty string
Braces 0 1 4 Map braces to empty string
Other characters 8 8 15 Map other characters to empty string

upcase
map non-alphanumeric to space
remove nn, nmn, etc.
remove terminal generation suffixes
remove honorific prefixes and suffixes
map all space to null

knitr::knit_exit()