Last updated: 2021-01-13
Checks: 7 0
Knit directory:
fa_sim_cal/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20201104)
was run prior to running the code in the R Markdown file.
Setting a seed ensures that any results that rely on randomness, e.g.
subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3be3559. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the
analysis have been committed to Git prior to generating the results (you can
use wflow_publish
or wflow_git_commit
). workflowr only
checks the R Markdown file, but you know if there are other scripts or data
files that it depends on. Below is the status of the Git repository when the
results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .tresorit/
Ignored: data/VR_20051125.txt.xz
Ignored: output/ent_raw.fst
Ignored: renv/library/
Ignored: renv/staging/
Untracked files:
Untracked: analysis/01-6_clean_vars.Rmd
Untracked: analysis/standardise.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made
to the R Markdown (analysis/01-5_check_name.Rmd
) and HTML (docs/01-5_check_name.html
)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote
), click on the hyperlinks in the table below to
view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 3be3559 | Ross Gayler | 2021-01-13 | wflow_publish(c(“analysis/index.Rmd”, "analysis/01-5*.Rmd")) |
html | 22f0b81 | Ross Gayler | 2021-01-13 | Build site. |
Rmd | d3deb84 | Ross Gayler | 2021-01-13 | Add 01-5 check name |
# Set up the project environment, because each Rmd file knits in a new R session
# so doesn't get the project setup from .Rprofile
# Project setup
library(here)
source(here::here("code", "setup_project.R"))
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.0.4 ✓ dplyr 1.0.2
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
# Extra set up for the 01*.Rmd notebooks
source(here::here("code", "setup_01.R"))
Attaching package: 'glue'
The following object is masked from 'package:dplyr':
collapse
# Extra set up for this notebook
# ???
# start the execution time clock
tictoc::tic("Computation time (excl. render)")
The 01*.Rmd
notebooks read the data, filter it to the subset to be
used for modelling, characterise it to understand it, check for possible
gotchas, clean it, and save it for the analyses proper.
This notebook (01-5_check_name
) characterises the name variables in
the saved subset of the data.
These variables will be used to construct the main predictors in the compatibility models.
We intend to use the one snapshot file as both the database to be queried and as the set of queries. Consequently, strictly speaking, we don’t need to standardise the name variables because the database and query records are guaranteed to be identical (they will literally be the same record). However, we will look at the name variables with an eye to standardisation because it is never a good idea to statistically model data without having an idea about the quality of the data. We will apply some basic standardisation to the name variables, if appropriate, because it parallels what would be necessary in practice.
Define the name variables.
vars_name <- c(
"last_name", "first_name", "midl_name", "name_sufx_cd"
)
Read the usable data. Remember that this consists of only the ACTIVE & VERIFIED records.
# Show the entity data file location
# This is set in code/file_paths.R
f_entity_fst
[1] "/home/ross/RG/projects/academic/entity_resolution/fa_sim_cal_TOP/fa_sim_cal/output/ent_raw.fst"
# get data for next section of analyses
d <- fst::read_fst(
f_entity_fst,
columns = c(vars_name, "sex") # get sex as well for cross-checking
) %>%
tibble::as_tibble()
dim(d)
[1] 4099699 5
Take a quick look at the distributions.
d %>% skimr::skim()
Name | Piped data |
Number of rows | 4099699 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
character | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
last_name | 0 | 1.00 | 1 | 21 | 0 | 191996 | 0 |
first_name | 23 | 1.00 | 1 | 19 | 0 | 126589 | 0 |
midl_name | 252695 | 0.94 | 1 | 20 | 0 | 175742 | 0 |
name_sufx_cd | 3869063 | 0.06 | 1 | 3 | 0 | 101 | 0 |
sex | 0 | 1.00 | 3 | 6 | 0 | 3 | 0 |
last_name
100% filledfirst_name
~100% filled (23 missing)midl_name
94% filledname_sufx_cd
6% filledLook at the distributions of name lengths first, before moving on to analyses more focused on standardisation.
Calculate the lengths of the name variables.
x <- d %>%
dplyr::mutate(
len_last = stringr::str_length(last_name),
len_first = stringr::str_length(first_name),
len_midl = stringr::str_length(midl_name)
)
last_name
Voter last name
Look at the distributions of name lengths.
summary(x$len_last)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 5.000 6.000 6.345 7.000 21.000
table(x$len_last, useNA = "ifany")
1 2 3 4 5 6 7 8 9 10
18 2046 53580 393363 864542 1094952 805773 514347 212379 96777
11 12 13 14 15 16 17 18 19 20
33039 12034 6844 4239 2679 1632 824 404 152 73
21
2
x %>%
ggplot() +
geom_histogram(aes(x = len_last), binwidth = 1) +
scale_y_sqrt()
Version | Author | Date |
---|---|---|
22f0b81 | Ross Gayler | 2021-01-13 |
Look at examples of short names.
# length == 1
x %>%
dplyr::filter(len_last == 1) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
A | CHUH | NA |
A | THEK | NA |
H | MOIH | NA |
J | J | NA |
K | HOA | HIEP |
K | NGEO | NA |
K | NIUH | NA |
K | RICHARD | V |
K | SANG | NA |
M | COY | FAY |
N | RENEE | VIVIAN |
R | ANDREW | PERNELL |
R | MARY | NA |
S | PETER | THOMAS |
U | RAYMOND | NA |
X | MARCUS | NA |
X | WILLIE | LARRY |
Y | PRUM | NA |
# length == 2
x %>%
dplyr::filter(len_last == 2) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
AR | ORAWAN | P |
DO | HANH THUAN | T |
EA | YOUNG | HOW |
EO | YONG | SUK |
HA | YONG | S |
KO | LINDA | KYONGSUK |
LE | ANDREW | CHAU |
LE | DANH | MINH |
LE | DU | D |
LE | NANCY | NICHOLS |
LE | QUANG | TRAN |
LU | IAN | MICHAEL |
MA | ARNOLD | M |
MA | JAMES | SUNG KAO |
NG | AMY | L0CKAMY |
VO | KHANH | HUU |
WU | KUY | M |
YI | JI SUK | NA |
YU | JUN | HYUK |
YU | XIAO | LI |
Look at examples of long names.
# length == 21
x %>%
dplyr::filter(len_last == 21) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ALESSANDRETTI-STRAUSS | MARIA | E |
BREWINGTON-SUTHERLAND | LISA | A |
# length >= 20
x %>%
dplyr::filter(len_last >= 20) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ANASTASIOU-JOSEPHIDE | THEODORA | A |
ARDESHIRPOUR-ZARTOSH | PARVIZ | NA |
ARRIAGADA-VALENZUELA | GONZALO | ESTEBAN |
BEDINGFIELD-DEMATTEO | HOLLIS | BEDINGFIELD |
BEN MESSAOUDMESSAOUD | AHMED | BEN |
FERRIOLA-BRUCKENSTEI | ZACHARY | NA |
FRANKFORT-WINNINGHAM | SUSAN | R |
GERKHARDT-GODZIEMSKI | ALICE | ELIZABETH |
HUDSON-CHARLES-PIERR | MONIQUE | NA |
KACZMAREK-HUFFSTETLE | KIM | NA |
KLOCZKOWSKI-BERTRAND | DAWN | M |
MCCUTCHEON-GUTKNECHT | LISA | ANN |
MORRISON-WESTMORELAN | DAWN | IRVING |
NOOHLANHLA GUGULETHE | ALAMILLA | NA |
SCHIAPPACASSE-DEPUTY | STEPHANIE | E |
SOTELO DE LOS SANTOS | MARCOS | ANTONIO |
THEODORDES-GRINESTAF | APRIL | ARLETHA |
THEODORIDES-GRINESTA | APRIL | ARLETHA |
VALL-SPINOSA PERKINS | JESSIE | FAYE |
WASHINGTON-HALFKENNY | DAVID | D |
first_name
Voter first name
Look at the distributions of name lengths.
summary(x$len_first)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 5.000 6.000 5.913 7.000 19.000 23
table(x$len_first, useNA = "ifany")
1 2 3 4 5 6 7 8 9 10
8070 3799 99236 525505 1077727 1018768 884199 295743 135014 19359
11 12 13 14 15 16 17 18 19 <NA>
29314 1487 880 345 215 9 4 1 1 23
x %>%
ggplot() +
geom_histogram(aes(x = len_first), binwidth = 1) +
scale_y_sqrt()
Warning: Removed 23 rows containing non-finite values (stat_bin).
Version | Author | Date |
---|---|---|
22f0b81 | Ross Gayler | 2021-01-13 |
Look at the missing names.
x %>%
dplyr::filter(is.na(first_name)) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ALEXANDER | NA | JASON |
AMEN | NA | NA |
BULLARD | NA | ALEXIS |
BURGESS | NA | NA |
CHESTER | NA | JAMES |
ELSASS | NA | NA |
FRISBY | NA | M |
FRYE WILLIAM C | NA | NA |
FUQUA | NA | MARY |
FUQUA | NA | WILLIAM |
GRAYWOLF | NA | NA |
JUDITH | NA | NA |
KAUCHICK | NA | PAULINE |
MAGENTA | NA | NA |
MALIK | NA | NA |
MCKEEL | NA | LESTER |
MOLET | NA | MICHAEL |
MORRIS | NA | ALEXANDER |
PATTERSON | NA | JOHN DEXTER |
PHOENIX | NA | NA |
SILVERMOON | NA | NA |
WARREN | NA | NA |
ZIMMER | NA | CLIFFORD |
Look at examples of short names.
# length == 1
x %>%
dplyr::filter(len_first == 1) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ANDREWS | A | E |
BARNETTE | C | V |
BENFIELD | J | D |
BOONE | A | C |
BOSTWICK | H | KATHLEEN |
CAMPBELL | W | THOMAS |
HERMAN | L | E |
HOOKER | S | A |
MCDONALL | W | B MRS |
MILLER | J | H |
MILTON | E | D |
OVERMAN | R | DALE |
REIDENBACH | W | SCOTT |
SMITH | J | C |
STRAUB | C | WINIFRED |
TOWNSEND | J | B |
TUTTLE | M | GERTRUDE |
WIGGINS | J | BELTON |
WILLIAMS | A | BLANDENA |
WILLIFORD | W | T |
# length == 2
x %>%
dplyr::filter(len_first == 2) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
BOONE | JO | GRAHAM |
BROOKS | SU | TONYA CARYETTE |
CALVILLO | AJ | NA |
CLARK | JO | W |
FAIR | JD | FAIR |
FAULHABER | JO | ANN |
FOWLER | LA | SONDA |
HARVEY | TY | NA |
HOFF | MI | YONG |
JUDD | JO | D |
MCKEE | JO | SHUMATE |
MILLER | AL | NA |
MIMS | JO | CHANDLER |
MULLEN | JO | L |
NGUYEN | HO | NGOC |
NICHOLSON | DE | MELVIN |
TANG | YU | NA |
THOMASON | JO | CARPENTER |
WHICHARD | AL | NA |
WON | UN | T |
2-letter first names appear to be:
Look at the long names.
# length >= 16
x %>%
dplyr::filter(len_first >= 16) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ANDERSON | MICHAEL-CHEROKEE | DEMCK |
DOUPE | KIMBERLY DANIELLE | WYATT |
ENRIQUEZ | MARIA DEL CARMEN | NA |
FIELDS | ADRIENNE`FELICIA | NA |
LAPPAS-KOTARA | MICHELLE-ADRIENNE | NA |
MIDDLESWORTH | ELIZABETH-LINDSAY | MCCOY |
NAGARAJ | SANTHEBACHAHALLI | S |
NATARAJA | HEGGADADEVANAKOTE | NA |
NGUYEN | THI PHUONG KHAUH | NA |
NUNEZ | MARIANA DE JESUS | N |
ODEMS | MICHAEL-CHRISTOPHER | NA |
PERRY | SHIRLEY ANN-PEPPER | NA |
RODRIGUEZ | MARIA DEL CARMAN | NA |
SUBRAMANIAM | LAKSHMINARAYANAN | NA |
WINKLER | ELIZABETH PORTIS | G |
Long first names appear to be:
midl_name
Voter middle name
These names will often be missing or initials only.
Look at the distributions of name lengths.
summary(x$len_midl)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 3.00 5.00 4.73 6.00 20.00 252695
table(x$len_midl, useNA = "ifany")
1 2 3 4 5 6 7 8 9 10 11
826716 10491 289439 440549 651587 705383 508158 227267 114306 30604 20536
12 13 14 15 16 17 18 19 20 <NA>
9807 5186 3514 3379 50 21 8 2 1 252695
x %>%
ggplot() +
geom_histogram(aes(x = len_midl), binwidth = 1) +
scale_y_sqrt()
Warning: Removed 252695 rows containing non-finite values (stat_bin).
Version | Author | Date |
---|---|---|
22f0b81 | Ross Gayler | 2021-01-13 |
Look at the long names.
# lentgh >= 16
x %>%
dplyr::filter(len_midl >= 16) %>%
dplyr::select(ends_with("_name")) %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name |
---|---|---|
ARTIST | SYLVIA | JOYCE WIILIAMSON |
BILLOW | LESLEY | ELIZABETH CLAUSS |
BOWDEN | CORA | FRANCES THOMPSON |
BRINKHUIS | VANESSA | INGRID-PRISCILLA |
CALL | LUNIA | ANNTONIA MCCRARY |
DELLA | MEA | CAROLYN ROBINSON |
EXUM | SHEILA | LANENA WHITEHEAD |
GANAWAY | SUSAN | ANN WINTERHALTER |
GULLEY | JOHN | MARCUS DELAFAYETTE |
HARRIS | ANN | PULLER- MARCELINE-ZO |
HARTSFIELD | NAOMI | RUTH SATTERWHITE |
HICKS | NELLIE | BEATRICE-RICHARDSON |
HOGGARD | ANN | DENISE HARRINGTON |
MOORE | VIDA | GWENEVERE BARNER |
RIVERA | RAFAEL | ANTONIO CARAMBOT |
ROGERS | RUBYE | REBECCA/SUDDRETH |
SWINSON | MARY | ELIZABETH FRANCIS |
WHITENER | STEPHANIE | LYNNE WARREN PARKER |
WOOD | T | BENBURY HAUGHTON |
YOUNGER | ZEE | CAMILLE PREVETTE |
# clean up
rm(x)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1805588 96.5 2906791 155.3 2906791 155.3
Vcells 29963098 228.7 87580157 668.2 108920871 831.1
name_sufx_cd
Voter name suffix
This is intended for generation markers, e.g. Junior, Senior.
I am not going to use name suffix in entity resolution because age should be sufficient and is much better quality. I will look at what values turn up in the name suffix because the same values sometimes wrongly occur in the main name variables. Knowing what values occur may help us to remove those values from the main name variables.
d %>% dplyr::select(name_sufx_cd) %>% skimr::skim()
Name | Piped data |
Number of rows | 4099699 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
character | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
name_sufx_cd | 3869063 | 0.06 | 1 | 3 | 0 | 101 | 0 |
table(d$name_sufx_cd, useNA = "ifany") %>% sort() %>% rev()
<NA> JR III SR II IV JR. SR. I V
3869063 153804 29605 27494 14043 3682 1060 226 218 190
111 MRS 11 VI ` VII MR. MS. J E
67 50 28 27 13 9 7 5 5 4
MR C W SCO S REV R N M JD
3 3 2 2 2 2 2 2 2 2
DR. D ANN 0 (JR X WAL VIR TOB Sr.
2 2 2 2 2 1 1 1 1 1
SMI SAM REE RAY Q PLA P ON OD O
1 1 1 1 1 1 1 1 1 1
MS MOO MMO MD MCQ MAC LOC LLL LL LEW
1 1 1 1 1 1 1 1 1 1
LEE LAR L KIT KEN K JR, JAC ING ILI
1 1 1 1 1 1 1 1 1 1
II. H GUY GLE G FOR FAU F M EY EWA
1 1 1 1 1 1 1 1 1 1
ELS DOR DO DIC CUB CHA B ALB AJR A
1 1 1 1 1 1 1 1 1 1
8TH 5 3RD 39 346 2 1V 15 134 070
1 1 1 1 1 1 1 1 1 1
\\ (II
1 1
# get a better look at the cleaned suffixes
d %>%
dplyr::mutate(
sufx = name_sufx_cd %>%
stringr::str_to_upper() %>%
stringr::str_remove_all(pattern = "[^A-Z0-9]") %>% # remove non-alphanumeric
dplyr::na_if("")
) %>%
dplyr::count(sufx) %>%
dplyr::filter(n > 1) %>%
dplyr::arrange(desc(n), sufx) %>%
knitr::kable()
sufx | n |
---|---|
NA | 3869077 |
JR | 154867 |
III | 29605 |
SR | 27721 |
II | 14045 |
IV | 3682 |
I | 218 |
V | 190 |
111 | 67 |
MRS | 50 |
11 | 28 |
VI | 27 |
MR | 10 |
VII | 9 |
MS | 6 |
J | 5 |
E | 4 |
C | 3 |
0 | 2 |
ANN | 2 |
D | 2 |
DR | 2 |
JD | 2 |
M | 2 |
N | 2 |
R | 2 |
REV | 2 |
S | 2 |
SCO | 2 |
W | 2 |
Look at issues that might be addressed by standardisation.
For each type of standardisation issue look at first middle and last names separately, because the issue may manifest differently in each of the name variables.
d %>% dplyr::select(last_name) %>%
dplyr::filter(stringr::str_detect(last_name, "[a-z]"))
# A tibble: 3 x 1
last_name
<chr>
1 MacQUEEN
2 MacQUEEN
3 BROWN-McCULLOUGH
d %>% dplyr::select(first_name) %>%
dplyr::filter(stringr::str_detect(first_name, "[a-z]"))
# A tibble: 11 x 1
first_name
<chr>
1 JoANN
2 LaVERNE
3 JoANNE
4 JoANN
5 SiROBERT
6 McCKINES
7 DeNEAL
8 McHILDIA
9 JoANN
10 LaSONYA
11 JeROME
d %>% dplyr::select(midl_name) %>%
dplyr::filter(stringr::str_detect(midl_name, "[a-z]"))
# A tibble: 76 x 1
midl_name
<chr>
1 McBRIDE
2 McBRIDE
3 McKINNIE
4 McLAWHORN
5 McKEITHAN
6 McCULLEN
7 MacFRANKLIN
8 McQUEEN
9 McPHAIL
10 McCULLEN
# … with 66 more rows
Check for non-alphanumeric characters in names.
Check for hyphens.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "-"))
nrow(x)
[1] 20543
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
BAZAN-MANSON | ANDREA | NA | NA | FEMALE |
BENITEZ-GRAHAM | ANA | NA | NA | FEMALE |
BERRY-DANIEL | VAUGHN | NA | NA | FEMALE |
BIBB-FREEMAN | TIFFANY | OCTAVIA | NA | FEMALE |
COALE-KRUPA | MARY | KITTY | NA | FEMALE |
EVERETT-GIGLIO | SUZANNE | MARIE | NA | FEMALE |
HARRISON-LAMPTEY | JAMES | CHARLES | NA | MALE |
JEAN-PIERRE | HERBERT | NA | NA | MALE |
JENKINS-JAMES | TREVA | NA | NA | FEMALE |
JOHNSON-DILLENBECK | LINDA | JEAN | NA | FEMALE |
JOHNSON-FORBES | SHAQUILLA | NICOLE JOHNSON | NA | FEMALE |
MACK-PURNELL | JOYCE | A | NA | FEMALE |
MANNING-SHAUB | CHERYL | NA | NA | FEMALE |
POLLARD-GREIF | RHIANNON | ELIZABETH | NA | FEMALE |
RADFORD-BLACK | ANITA | NA | NA | FEMALE |
RICHMOND-GRAVES | VANESSA | D | NA | FEMALE |
SHERRELL-PATTERSON | KRYSTAL | ANN | NA | FEMALE |
STONE-TANENBERG | KAREN | ANNE | NA | FEMALE |
TILLEY-VARELA | MYRA | AMANDA | NA | FEMALE |
WIMBISH-VANDERBECK | LAURA | NA | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "-"))
nrow(x)
[1] 3011
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
ROSAVAGE | ANN-MARIE | NA | NA | FEMALE |
CRENSHAW | CALLIE-ANNE | DOANE | NA | FEMALE |
GLENN | CHIH-TZU | L | NA | FEMALE |
WOODARD | ESTHER-JOAN | SURRETT | NA | FEMALE |
ROUSSEAUX | JEAN-CLAUDE | CHRISTIAN | NA | MALE |
BERARD | JEAN-PAUL | NA | NA | MALE |
ARTIS | JO-ANN | NA | NA | FEMALE |
FURLONG | JO-ANN | ALICE | NA | FEMALE |
DALE | JON-MARC | RYAN | NA | MALE |
COOK | KAWIKA-JAMAL | SAMUEL | NA | MALE |
MILLER | LEE-JAMIL | K | NA | MALE |
BEAVER | RUTH-ANNE | GUST | NA | FEMALE |
YEUNG | SHIN-YIING | NA | NA | FEMALE |
JAN | SHYI-TAI | NA | NA | MALE |
CHU | TE-HSIN | A | NA | FEMALE |
SU | TSUNG-HU | NA | NA | FEMALE |
MOJICA | WILLIAM-JOSEPH | KAILI | NA | MALE |
TSAI | WON-WHEI | NA | NA | FEMALE |
TAPP | YOUNG-SUK | O | NA | FEMALE |
CHANG | YU-JHI(JULIE) | CHEN | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "-"))
nrow(x)
[1] 3883
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
CABRIELE | DEBRA | ANN-MARIE | NA | FEMALE |
CONRAD | HEATHER | CLEONA-JANE | NA | FEMALE |
REEVES | ANDREW | DAVID-JOEL | NA | MALE |
BROOKS | DARRICK | E-ALSTON | NA | MALE |
EBRAHIM | EMAD | ELDIN-YASHAR | NA | MALE |
WILSON | KAY | FRANCES-LAWS | NA | FEMALE |
LANE | MICHELLE | GAYE-PRESTON | NA | FEMALE |
YUAN | DEREK | HAW-LUEN | NA | MALE |
PATE | BARBARA | JEAN-DALE | NA | FEMALE |
CHAN | GODWIN | KWOK-YIN | NA | MALE |
HUGHES | RACHEL | LYNN-INGRAM | NA | FEMALE |
MORTON | QUIANA | MAISHA-ANN | NA | FEMALE |
SAYE | ROBYN | MOO-YOUNG | NA | FEMALE |
DIXON | STANLEY | RAY-HAMILTON | NA | MALE |
PARKER | BRANDON | SHON-DAY | NA | MALE |
CYRAN | JACLYN | SUZANNE-MARIE | NA | FEMALE |
BOWMAN | HELEN | TAUSSIG-HAUPT | NA | FEMALE |
BELL | ROSE | TIEH-CHIN | NA | FEMALE |
COLDREN | RUTH | VIOLA-SHEATS | NA | FEMALE |
KIM | LEGIA | YOUNG-SON | NA | FEMALE |
Check for quotes.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "'"))
nrow(x)
[1] 4920
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
D’ARBEAU | STEPHEN | B | NA | MALE |
I’ANSON-JACKSON | JENNIFER | NA | NA | FEMALE |
O’BRIANT | MABEL | S | NA | FEMALE |
O’BRIEN | MARK | S | NA | MALE |
O’BRIEN | PATRICK | WAYNE | SR | MALE |
O’BRIEN | WILLIAM | PATRICK | NA | MALE |
O’CONNELL | TINA | DEE | NA | FEMALE |
O’MALLEY-HELMS | COLLEEN | E | NA | FEMALE |
O’MEARA | MORGAN | STUART | NA | MALE |
O’NEAL | DORIS | TAYLOR | NA | FEMALE |
O’NEAL | KELLY | NA | NA | FEMALE |
O’NEAL | BETTY | MAGALENE D | NA | FEMALE |
O’NEAL | TAKINA | L | NA | FEMALE |
O’NEAL | LINDA | F | NA | FEMALE |
O’NEAL | CHARLES | FRANKLIN | NA | MALE |
O’NEIL | DONNA | LOUISE | NA | FEMALE |
O’NEILL | LUCILLE | W | NA | FEMALE |
O’QUINN | VICKIE | LEE | NA | FEMALE |
O’ROURKE | JOHN | F | NA | MALE |
O’ROURKE | JEFFERY | JAMES | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "'"))
nrow(x)
[1] 1226
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
KNIGHT | A’NDREA | LANIER | NA | FEMALE |
RICHARDSON | ANDRE’ | STEVEN | NA | MALE |
CORBETT | D’ANDREA | PERE | NA | FEMALE |
PASOUR | D’ETTA | TAYLOR | NA | FEMALE |
JONES | DEONTAE’ | QUINN | NA | MALE |
EDWARDS | DESIRE’ | DENISE | NA | FEMALE |
WILKINS | FAR’D | HAKEEM | NA | MALE |
RICHARDSON | J’MAINE | NMN | NA | MALE |
ALSTON | J’MIA | KAE | NA | FEMALE |
DUNLAP | JA’TINA | R | NA | FEMALE |
SUITT | L’TONYA | NA | NA | FEMALE |
LITTLEJOHN | LA’KANYA | MICHELLE | NA | FEMALE |
HALL | LA’KETTA | CHENTAL | NA | FEMALE |
DOWELL | LA’TONYA | YVETTE | NA | FEMALE |
FORD | O’DENA | NA | NA | FEMALE |
JACKSON | O’NEIL | NA | NA | MALE |
HILBURN | O’NEILL | NA | NA | MALE |
LEBEAU | RENE’ | DOMITIEN | NA | MALE |
WEATHERBEE | SADE’ | SHANNON | NA | FEMALE |
MITCHELL | SHANTAE’ | T | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "'"))
nrow(x)
[1] 3152
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
GRILLO | LUIS | CHE’ | NA | MALE |
CRAWFORD | NICKOLAS | D’ANDRE | NA | MALE |
LARSEN | HEATHER | D’ANN | NA | FEMALE |
SINGHATEH | NICHELLE | DY’VONNE | NA | FEMALE |
HARPER | DUNSEY | LA’TAZE | NA | MALE |
BATTLE | IKEDA | LE’RECIA | NA | FEMALE |
O’CONNELL | KAREN | O’BRIEN | NA | FEMALE |
SPERRY | ANN | O’BRIEN | NA | FEMALE |
GINYARD | DEEDRICK | O’BRIEN | NA | MALE |
ARNEY | KATHLEEN | O’DWYER | NA | FEMALE |
DOWNES | ANN | O’HARA | NA | FEMALE |
VANHOOK | BRANDON | O’NEAL | NA | MALE |
JONES | ROBIN | O’NEIL | NA | FEMALE |
JOHNSON | DAESHAWAN | O’NEIL | NA | MALE |
MANNS | RUSSELL | O’NEIL | JR | MALE |
CALLOWAY | TOMIKA | RENEE’ | NA | FEMALE |
THOMAS | DENA | RENEE’ | NA | FEMALE |
CALLAHAN | PAMELA | RENEE’ | NA | FEMALE |
GRAHAM | QUINDERIA | SH’RON | NA | FEMALE |
JAMES | BRITTANY | VONT’E | NA | FEMALE |
Check for periods.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "\\."))
nrow(x)
[1] 11
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
BINGHAM JR. | AMES | EDMOND | NA | MALE |
DAYE JR. | JAMES | NA | JR | MALE |
RUSSELL, JR. | KERMITT | PATRICK | NA | MALE |
ST. CLAIR | JACK | LEE | NA | MALE |
ST. CYR | CANDICE | NICOLE | NA | FEMALE |
ST. GEORGE | MARTHA | S | NA | FEMALE |
ST. GERMAIN | AMY | NA | NA | FEMALE |
ST. JOHN | JESSICA | JO | NA | FEMALE |
ST. LAWRENCE | ELIZABETH | W | NA | FEMALE |
ST.CLAIRE | KEVIN | WAYNE | NA | MALE |
ST.JOHN | JOANN | DIMAGGIO | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "\\."))
nrow(x)
[1] 120
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
NORRIS | A.T. | NA | NA | MALE |
EVINS | BETTY L. | CURRIN | NA | FEMALE |
BUIE | BEVERLY D. | COOKE | NA | FEMALE |
DUNN | C. SHAY | HARRELSON | NA | FEMALE |
SWOFFORD | D. | MILYNN | NA | FEMALE |
ROSS | E. | TRAVIS | JR | MALE |
INGOLFSSON | E. JUANITA | O’BRIEN | NA | FEMALE |
LILLEY | G. | C. | NA | MALE |
NAVARRE | J. | RICHARD | II | MALE |
JARRETT | J. | REID | NA | MALE |
AINSLEY | J. (JULIUS) | T.(THOMAS) | NA | MALE |
RENDLEMAN | J.T. | NA | NA | MALE |
GIBSON | M. | COLINE | NA | FEMALE |
HICKS | MARY E. | PALMER | NA | FEMALE |
UNDERWOOD | NORMA J. | PHILLIPS | NA | FEMALE |
GARSKA | P.J. | JAN DE BEWR | NA | FEMALE |
USSERY | PRISCILLA B. | SANDERS | NA | FEMALE |
BOUCHER | T. RENEE | NA | NA | FEMALE |
GABLE | THOMAS J. | WESTLEY | NA | MALE |
MUSSON | W. | JAMES | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "\\."))
nrow(x)
[1] 2233
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
KYKER | JACOB | C. | NA | MALE |
TOWNSEND | IRIS | D. MELENDEZ | NA | FEMALE |
FLOYD | WALTER | E. | NA | MALE |
WARMACK | WILLIAM | G. | NA | MALE |
MASON | PEGGY | H. | NA | FEMALE |
ENOCH | WILLIAM | H. | NA | MALE |
MITCHELL | JOHN | J. | NA | MALE |
HAMILTON | MILON | J. | NA | MALE |
PARROTT | ULYSSES | J. | JR | MALE |
NEWBLE | ANDRE | L.K. | NA | MALE |
LENAHAN | LOIS | M. | NA | FEMALE |
JONES | TANETTA | M. | NA | FEMALE |
PHILLIPS | HELEN | M. | NA | FEMALE |
MOORE | DOROTHY | M. | NA | FEMALE |
EDWARDS | STANLEY | M. | NA | MALE |
BROOKS | MARILYN | M.LEDFORD | NA | FEMALE |
WAVERLY | TRACY | R. | NA | FEMALE |
SCOTT | HENRY | R. | NA | MALE |
GRAHAM | KENDRA | T. | NA | FEMALE |
ESPERGREN | MARY | T. | NA | FEMALE |
Check for commas.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, ","))
nrow(x)
[1] 2
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
FILLINGHAM, II | ROBERT | E | NA | MALE |
RUSSELL, JR. | KERMITT | PATRICK | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, ","))
nrow(x)
[1] 4
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
PHILLIPS | FRANK, | NA | JR | MALE |
HICKS | MARION, | NA | SR | MALE |
CANIPE | NOAH, | NA | JR | MALE |
MCADAMS | WILL,JR | NA | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, ","))
nrow(x)
[1] 12
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
FAUCETTE | JESSE | EDWARD, J | NA | MALE |
BRASWELL | ROBERT | ELLIS, J | NA | MALE |
MARTIN | LLOYD | FRANKLIN, S | NA | MALE |
GAY | ROBERT | HENRY, III. | NA | MALE |
FERGUSON | STANTON | HYDE, J | NA | MALE |
CLARK | COLEMAN | JACKSON, I | NA | MALE |
BARNES | RUSSELL | JOSEPH, J | NA | MALE |
PIERCE | RUTH | P, | NA | FEMALE |
COVINGTON | EDNA(MRS | PERRY, JR) | NA | FEMALE |
SCARBOROUGH | JOHN | R, | NA | MALE |
SHEARIN | ANDREW | THOMAS, S | NA | MALE |
WILLIAMS | ERVIN | W., SR., | NA | MALE |
Check for other non-alphanumeric characters.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "[^ a-zA-Z0-9\\.,'-]"))
nrow(x)
[1] 31
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) # %>%
# A tibble: 20 x 5
last_name first_name midl_name name_sufx_cd sex
<chr> <chr> <chr> <chr> <chr>
1 "BOYLES`" LINDA BROWN <NA> FEMALE
2 "BRYANT`" WILLIAM STEWART <NA> MALE
3 "COLLINS/SISK" RHONDA L <NA> FEMALE
4 "D*AMICO" PATRICIA MARIE <NA> FEMALE
5 "D*AMICO" MEGAN MARIE <NA> FEMALE
6 "GALINSKY/MALAGUTI" DANA ANNE <NA> FEMALE
7 "GOSHEN\\" DIXIE M <NA> FEMALE
8 "LA\"BEE" DELACRUZ <NA> <NA> FEMALE
9 "MARTIN/HUFF" ELLEN MARIE <NA> FEMALE
10 "MORRISON`" HAZEL M <NA> FEMALE
11 "NICHOLS/BROWN" MARY SUE <NA> FEMALE
12 "O*BRIEN" COLIN JAMES <NA> MALE
13 "O*TOOLE" PETER TERRENCE <NA> MALE
14 "REAVIS/LONG" SHAWN MICHELLE <NA> FEMALE
15 "RHONEY/PETERS" DONNA <NA> <NA> FEMALE
16 "SCHERM%MARTIN" WYATT <NA> <NA> FEMALE
17 "STRTHEIT\\" LOLA C <NA> FEMALE
18 "TALBERT/GRAHAM" BRENDA <NA> <NA> FEMALE
19 "TUCKER/JACKSON" LAVONDA LYNN <NA> FEMALE
20 "WOODARD`" JASON WARREN <NA> MALE
# knitr::kable() # some of the characters break the kable formatting
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "[^ a-zA-Z0-9\\.,'-]"))
nrow(x)
[1] 102
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) # %>%
# A tibble: 20 x 5
last_name first_name midl_name name_sufx_cd sex
<chr> <chr> <chr> <chr> <chr>
1 POTEAT "(KAY)" ANNE CATH <NA> FEMALE
2 FIELDS "ADRIENNE`FELICIA" <NA> <NA> FEMALE
3 STEELE "AR`KISHA" FERNESE <NA> FEMALE
4 JACKSON "AR`MONIE" <NA> <NA> FEMALE
5 STUBBS "BRITNE`" ELIZABETH <NA> FEMALE
6 CLARK "CANDERE`" L <NA> FEMALE
7 SELF "CATHERINE`" MARIE <NA> FEMALE
8 FABIAN "D`ARLINE" D <NA> FEMALE
9 INGRAM "D`WON" LAMONTE <NA> MALE
10 NICHOLS "DORIS ( MRS W" <NA> <NA> FEMALE
11 STEWART "JA`VONDA" NICHOLE <NA> FEMALE
12 SPENCER "JAMES (JIM)" N <NA> MALE
13 CARTER "JOSE`" PIERRE <NA> MALE
14 HEMPHILL "LA`CHERICA" EVON <NA> FEMALE
15 CHESTNUT "LA`WANDA" F <NA> FEMALE
16 DUNN "MARY (\"PETE\")" BURNETTE <NA> FEMALE
17 JERNOVICS "MARY SUSAN/" R <NA> FEMALE
18 KERN "O (BUDDY)" R <NA> MALE
19 FOSTER "OTIS(NMN)" JR <NA> MALE
20 DAVIDOV "ZVIYA`CRYSTAL" <NA> <NA> FEMALE
# knitr::kable() # some of the characters break the kable formatting
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "[^ a-zA-Z0-9\\.,'-]"))
nrow(x)
[1] 1097
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
KEENE | JOSEPHINE | (MRS OTIS) | NA | FEMALE |
PIERMARINI | JACLYN | (NMN) | NA | FEMALE |
TORAIN | ROOSEVELT | (NMN) | JR | MALE |
WILEY | ROOSEVELT | (NMN) | JR | MALE |
HARVEY | NATHANIEL | (NMN) | NA | MALE |
EULISS | MAX | (NMN) | NA | MALE |
ASHE | WILLARD | (NMN) | NA | MALE |
CAMERON | CHARLES | (NMN) | JR | MALE |
CRUTCHFIELD | WAYNE | (NMN) | NA | MALE |
ROBERTS | ROBIN | (NMN) | NA | MALE |
COHEN | SETH | (NMN) | NA | MALE |
DEMAS | DOLORIS | A/GEARHART | NA | MALE |
MUSSELWHITE | ALLISON | ELAINE/HUMPH | NA | FEMALE |
BASS | H | J (HUBERT) | NA | MALE |
BRITT | ANGELA | KAY / ROGERS | NA | FEMALE |
LOCKLEAR | MINNIE | LEE/JONES | NA | FEMALE |
DREW | VIRGINIA | M/KELLEY | NA | FEMALE |
BROOKS | WILLIAM | MACK (BILL) | NA | MALE |
FRIEDRICH | DELLA | MAE /KEYS | NA | FEMALE |
ALSTON | MEI | WAN/DAN | NA | FEMALE |
Check for digits.
Check for zero
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "0"))
nrow(x)
[1] 29
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
ALEM0N | NOE | A | NA | MALE |
BOLAD0 | PAULA | HUTCHENS | NA | FEMALE |
CAPUT0 | BARBARA | DAVIS | NA | FEMALE |
CONR0Y | WILLIAM | COURTNEY | NA | MALE |
D0WNS | MARIO | ENRICO | NA | MALE |
EAT0N | VICKIE | TUGGLE | NA | FEMALE |
ESC0BEDO | AUDREY | ANN | NA | FEMALE |
FERNANDEZ-BRAV0 | GIOVANNI | NA | NA | MALE |
GUARDAD0 | MANUEL | FELIX | NA | MALE |
J0HNSON | LUCILLE | FRANCES | NA | FEMALE |
JOHNS0N | MICHAEL | NA | NA | MALE |
MCD0UGAL | BETTY | JEAN | NA | FEMALE |
OCONN0R | GERALDINE | LOUISE | NA | FEMALE |
PEREZ-NAVARR0 | CAROLE | SHAY | NA | FEMALE |
R0CCO | CHRISTOPHER | NA | NA | MALE |
REYN0LDS | ADAM | DANIEL | NA | MALE |
RUSS0 | ANGEL | MARIE | NA | FEMALE |
SIMPS0N | MARY | ANN | NA | FEMALE |
WIT0SKY | MICHAEL | ADAM | NA | MALE |
YATSK0 | JEANETTE | MARIE | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "0"))
nrow(x)
[1] 33
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
PETTY | ALLIS0N | JEAN | NA | FEMALE |
BROWN | C0LBY | TODD | NA | MALE |
COOPER | C0RDELIA | P | NA | FEMALE |
WHITTEMORE | D0LORES | H | NA | FEMALE |
LOWE | D0NNA | G | NA | FEMALE |
BRANDT | J0HN | C | NA | MALE |
CRANFILL | J0HN | NA | NA | MALE |
ADAMS | J0HN | WILLIAMS | NA | MALE |
TANNAHILL | J0SEPH | ERIC | NA | MALE |
WILLIAMS | M0NIKA | UDANA | NA | FEMALE |
KEENAN | MARY-J0 | NA | NA | FEMALE |
SHEPHERD | OTH0 | L | NA | MALE |
THOMAS | P0LLY | BROWN | NA | FEMALE |
RAMIREZ | REYNALD0 | G | NA | MALE |
BUIE | S0NTE | Y | NA | FEMALE |
MITCHELL | SHANN0N | ARLINE | NA | FEMALE |
JOHNSON | T0NYA | BETH | NA | FEMALE |
MONK | T0NYA | SIVLEY | NA | FEMALE |
RUFFIN | TIM0THY | ONEILL | NA | MALE |
KENNEDY | V0NCIEAL | LEE | NA | FEMALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "0"))
nrow(x)
[1] 77
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
PONGPAIROJ | AMANDA | 0 | NA | FEMALE |
IVESTER | WILLIAM | 0DELL | NA | MALE |
MOORE | EVA | 0MAE | NA | FEMALE |
LANE | KATHLEEN | 0VERTON | NA | FEMALE |
SODAGAR | EASA | 2205 | NA | MALE |
FRENCH | SHNETTA | ALEXANDER080572 | NA | FEMALE |
NEWSOME | MARK | ANTH0NY | NA | MALE |
BRODIE | WILLIAM | C1010 | NA | MALE |
SMITH | BRODY | CO0PER | NA | MALE |
OROPEZA | AMILCAR | COL0N | NA | MALE |
LUCK | GENA | DON0HOO | NA | FEMALE |
STOLLBRINK | KATHY | J0 | NA | FEMALE |
NAYLOR | ANGELA | LY0NS | NA | FEMALE |
MCKOY | LILLY | M00RE | NA | FEMALE |
JONES | RASHAWN | M0NIQUE | NA | FEMALE |
MARSHALL | MONICA | NICH0LE | NA | FEMALE |
DAVIS | LEANN | RUNY0N | NA | FEMALE |
HOLTON | RANDY | SC0TT | NA | MALE |
NATION | LAVONIA | V0SS | NA | FEMALE |
HASH | MYRTLE | Y0UNG | NA | FEMALE |
Check for one.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "1"))
nrow(x)
[1] 1
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
SATTERFIELD 111 | CHARLES | MASON | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "1"))
nrow(x)
[1] 0
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "1"))
nrow(x)
[1] 39
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
PATTERSON | CARL | 11 | NA | MALE |
ADAMS | RALPH | 11 | NA | MALE |
REED | CHARLES | 11 | NA | MALE |
WILLIAMS | JOSEPH | 11 | NA | MALE |
BEST | KENNETH | 111 | NA | MALE |
LOPEZ | CARLOS | 111 | NA | MALE |
QUERY | FRED | 111 | NA | MALE |
FEATHERSTONE | GEORGE | 111 | NA | MALE |
FREEZE | HOMER | 111 | NA | MALE |
KLUTTZ | JOE | 111 | NA | MALE |
MCGOVERN | WILLIAM | 111 | NA | MALE |
WINECOFF | DAVID | 111 | NA | MALE |
JOHNSON | ULUS | 111 | NA | MALE |
COOKE | GEORGE | 111 | NA | MALE |
HOWERIN | MICHAEL | DALE401 | NA | MALE |
HUNTER | MORDECAI | J1-TO | NA | MALE |
FAICLOTH | TIMOTHY | LOUIS7100 | NA | MALE |
BREEN | TERRANCE | MICHAEL146 | NA | MALE |
PATTERSON | CARLA | NADINE DOUGLAS1 | NA | FEMALE |
PLESS | JOAN | WRIGHT2106 | NA | FEMALE |
Check for other digits.
x <- d %>%
dplyr::filter(stringr::str_detect(last_name, "[2-9]"))
nrow(x)
[1] 1
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(last_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
ALBER5TSON | BASIL | ERVIN | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(first_name, "[2-9]"))
nrow(x)
[1] 2
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(first_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
SPIVEY | FR4ANK | THOMAS | SR | MALE |
CHILTON | J8IMMIE | HERBERT | NA | MALE |
x <- d %>%
dplyr::filter(stringr::str_detect(midl_name, "[2-9]"))
nrow(x)
[1] 24
x %>%
dplyr::slice_sample(n = 20) %>%
dplyr::arrange(midl_name, sex) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex |
---|---|---|---|---|
SODAGAR | EASA | 2205 | NA | MALE |
YOUNG | WANWYNE | 4625 | NA | FEMALE |
CLARKE | MINERVA | 4932 | NA | FEMALE |
PHAIR | IDELL | 8017 | NA | FEMALE |
FRENCH | SHNETTA | ALEXANDER080572 | NA | FEMALE |
BEACHAM | HEATHER | ANDERSON9104576 | NA | FEMALE |
SHUMAKER | RUTH | ANN BURTON47 | NA | FEMALE |
KOERNER | JENNIFER | ANN155 | NA | FEMALE |
WARD | EVA | B2957 | NA | FEMALE |
HOWERIN | MICHAEL | DALE401 | NA | MALE |
GLOVER | DIONNE | LYNN1820 | NA | FEMALE |
GUIDO | DEANA | LYNN2513 | NA | FEMALE |
BECHTEL | TERESA | MARIE103062 | NA | FEMALE |
BREEN | TERRANCE | MICHAEL146 | NA | MALE |
HILL | ZEB | MITCHELL368 | NA | MALE |
BLAIR | ESSIE | MIZELLE25248249 | NA | FEMALE |
PERKINS | TERESA | ROSENBAUM3305 | NA | FEMALE |
TOOMES | BRIAN | SCOTT3450 | NA | MALE |
PYRTLE | PHILLIP | W5RAY | SR | MALE |
PLESS | JOAN | WRIGHT2106 | NA | FEMALE |
Look for special words that shouldn’t be in names.
Define word patterns to search for.
# honorifics
w_hons <- c(
"MR", "MISTER", "MASTER", "MRS", "MS", "MISS",
"REV", "REVEREND", "SR", "SISTER", "BR", "BROTHER",
"FATHER", "MOTHER", "PASTOR", "ELDER", "BISHOP",
"DR", "DOCTOR", "MD", "PROF", "PROFESSOR"
)
# generation suffixes
w_gen <- c(
"JR", "JNR", "JUNIOR", "SR", "SNR", "SENIOR",
"1ST", "2ND", "3RD", "4TH", "5TH", "6TH", "7TH", "8TH",
"FIRST", "SECOND", "THIRD", "FOURTH", "FIFTH", "SIXTH", "SEVENTH", "EIGHTH", "EIGHTTH",
"1", "2", "3", "4", "5", "6", "7", "8",
"I", "II", "III", "IIII", "IV", "V", "VI"
)
# special values
w_spec <- c(
"NN", "NMN", "NAME",
"UNK", "UNKNOWN", "AKA", "KNOWN AS", "ALSO KNOWN AS", "ALIAS",
"BLIND"
)
# test
w_test <- c(
"TEST", "TST", "DUMMY", "VOTER", "([A-Z])\\1{2,}"
)
# regular expression to match words
w_regexp <-
c(w_hons, w_gen, w_spec, w_test) %>% # all special words
unique() %>% # make it a set
dplyr::setdiff( # remove words that appear to mostly be validly used
c(
"BISHOP",
"BLIND",
"BROTHER",
"DOCTOR",
"ELDER",
"FIRST",
"JUNIOR",
"MASTER",
"MISS",
"MISTER",
"PASTOR",
"SENIOR",
"TEST",
"THIRD",
"VOTER"
)
) %>%
glue::glue(x = . , "\\b{x}\\b") %>% # must be words
glue::glue_collapse(sep = "|") # search for any
x <- d %>%
dplyr::mutate(
match =
last_name %>%
stringr::str_to_upper() %>%
stringr::str_replace_all(pattern = "[^ A-Z]", replacement = " ") %>%
stringr::str_squish() %>%
stringr::str_extract(pattern = w_regexp)
) %>%
dplyr::filter(!is.na(match))
nrow(x)
[1] 124
x %>%
dplyr::arrange(match, sex, last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex | match |
---|---|---|---|---|---|
WILLIAMSON DR | IRVIN | D | NA | MALE | DR |
I’ANSON-JACKSON | JENNIFER | NA | NA | FEMALE | I |
BIREN II | WILLIAM | GEORGE | NA | MALE | II |
CRITTENDON II | WILLIAM | BURRELL | NA | MALE | II |
EVANS II | DONALD | M | NA | MALE | II |
FILLINGHAM, II | ROBERT | E | NA | MALE | II |
GOODWIN II | PAUL | J | NA | MALE | II |
GREEN II | BILLY | HOWARD | NA | MALE | II |
METTS II | CAREY | MONTGOMERY | NA | MALE | II |
MILHORN II | JOSEPH | JAMES | NA | MALE | II |
PERSON II | DARYEL | JAMES | NA | MALE | II |
SEABOLD II | GERALD | W | NA | MALE | II |
STANLEY II | WILLIAM | A | NA | MALE | II |
TAYLOR II | ROBERT | D | NA | MALE | II |
THOMBS II | DANIEL | EUGENE | NA | MALE | II |
WATSON II | ROBERT | NATHANIEL | NA | MALE | II |
WORD II | JOE | NAHAN | NA | MALE | II |
AUSLEY III | PRESTON | ALEXANDER | NA | MALE | III |
BEATTY III | CURTIS | M | NA | MALE | III |
BLACKWELDER III | DWIGHT | MCNAIRY | NA | MALE | III |
BOONE III | JAMES | HENRY | NA | MALE | III |
BOSQUEZ III | RICHARD | NA | NA | MALE | III |
CHAPPELL III | TRAVIS | NA | NA | MALE | III |
COCKERHAM III | BOBBY | LEE | NA | MALE | III |
CONNELL III | THOMAS | JOSEPH | NA | MALE | III |
FAULKNER III | HOWARD | VERNON | NA | MALE | III |
GOODWIN III | WARD | ALEXANDER | NA | MALE | III |
GROUSE III | CHARLES | J | NA | MALE | III |
HARRIS III | WILLIAM | T | NA | MALE | III |
KNOX III | JOHN | J | NA | MALE | III |
LANE III | WILLIAM | JAMES | NA | MALE | III |
MCGUIRT III | JAMES | WILLIAM | NA | MALE | III |
MILLER III | JOHNNIE | H | NA | MALE | III |
MOORE III | JAMES | P | NA | MALE | III |
NEWSOME III | THOMAS | LESLIE | NA | MALE | III |
PEACOCK III | EDWARD | JACKSON | NA | MALE | III |
PETERS III | MARION | HOWELL | NA | MALE | III |
PRUDEN III | THOMAS | EUGENE | NA | MALE | III |
REDFEARN III | WILBERT | NA | NA | MALE | III |
SMITH III | GUY | R | NA | MALE | III |
THOMPSON III | EMERY | NA | NA | MALE | III |
BAKER IIII | WILLAIM | RAINEY | NA | MALE | IIII |
BUXTON IV | SAMUEL | R | NA | MALE | IV |
LONG IV | FLOYD | M | NA | MALE | IV |
THOMPSON IV | HARRY | M | NA | MALE | IV |
ANSELMENT JR | JOSEPH | LEONARD | NA | MALE | JR |
BALL JR | SAMUEL | LEE | NA | MALE | JR |
BARKLEY JR | CHARLES | W | NA | MALE | JR |
BENDER JR | JOHN | JOHN P | NA | MALE | JR |
BINGHAM JR. | AMES | EDMOND | NA | MALE | JR |
BIRCHFIELD JR | MILBURN | JOEL | NA | MALE | JR |
BLEDSOE JR | HOMER | BLAINE | NA | MALE | JR |
BROWN JR | ROBERT | A | NA | MALE | JR |
BUNDESMAN JR | BERNARD | B | NA | MALE | JR |
BYRD JR | HERBERT | L | NA | MALE | JR |
CAIL JR | MALCOLM | LEHOLMES | NA | MALE | JR |
CARRIER JR | ROBERT | WILSON | NA | MALE | JR |
CHAMBERS JR | KENNETH | RAY | NA | MALE | JR |
CHARLES JR | WILLIE | J | NA | MALE | JR |
CLAY JR | WILEY | WALTON | JR | MALE | JR |
CLAYTON JR | JAMES | D | NA | MALE | JR |
CULBRETH JR | WALTER | E | NA | MALE | JR |
DAYE JR. | JAMES | NA | JR | MALE | JR |
ENGLISH JR | WARREN | ROBERT | NA | MALE | JR |
EVANS JR | RALPH | NA | II | MALE | JR |
FAILLE JR | EDWARD | J | NA | MALE | JR |
FARMER JR | BENJAMIN | STEVE | NA | MALE | JR |
FRAZIER JR | JAMES | A | NA | MALE | JR |
GARCIA JR | FRANK | NA | NA | MALE | JR |
HALL JR | JAMES | B | NA | MALE | JR |
HARDIN JR | CHARLES | ELMORE | NA | MALE | JR |
HARGRAVES JR | JAMES | CALVIN | NA | MALE | JR |
HARRIS JR | CHAMP | NA | NA | MALE | JR |
HAWKINS JR | REED | GREGORY | NA | MALE | JR |
HENSLEY JR | LAWRENCE | G | NA | MALE | JR |
HERNDON JR | EVERETT | GEORGE | NA | MALE | JR |
HILL JR | JAMES | C | NA | MALE | JR |
HOYLE JR | GEORGE | A | NA | MALE | JR |
HUMPHRIES JR | DONNIE | R | NA | MALE | JR |
KENNEDY JR | THOMAS | E | NA | MALE | JR |
KUBU JR | JERRY | JOHN | NA | MALE | JR |
LANE JR | DAVID | C | NA | MALE | JR |
LAWRENCE JR | HARRY | NA | NA | MALE | JR |
MARBLE JR | ROBERT | STERLING | NA | MALE | JR |
MCCLURE JR | DONALD | R | NA | MALE | JR |
MCGUIRE JR | JOHN | M | NA | MALE | JR |
MONGIOVI JR | ANTHONY | B | NA | MALE | JR |
MOORE JR | HARRY | GRADY | NA | MALE | JR |
MORRISON JR | WILLIAM | EMERSON | NA | MALE | JR |
MOSES JR | MICHAEL | WILLIAM | NA | MALE | JR |
NASIFE JR | SAMUEL | NICHOLAS | NA | MALE | JR |
OUTLAND JR | HOWARD | BROWN | NA | MALE | JR |
OVERTON JR | ROBERT | ALLEN | NA | MALE | JR |
PARKS JR | JOEL | TIMOTHY | NA | MALE | JR |
PULSIFER JR | HAROLD | WINFRED | NA | MALE | JR |
REED JR | BRUCE | HAL | NA | MALE | JR |
ROBERTS JR | GEORGE | MARION | NA | MALE | JR |
RUSSELL, JR. | KERMITT | PATRICK | NA | MALE | JR |
SHADE JR | EVERETTE | LEE | NA | MALE | JR |
SHEALLY JR | WILLIAM | B | NA | MALE | JR |
ST JEAN JR | JOSEPH | NA | NA | MALE | JR |
STANSBERRY JR | DAVID | R | NA | MALE | JR |
STREETER JR | THOMAS | EARL | NA | MALE | JR |
VAN DOREN JR | EDWARD | FOSTER | NA | MALE | JR |
WHITEHOUSE JR | JOHN | JOSEPH | NA | MALE | JR |
WHITFIELD JR | RAYMOND | E | NA | MALE | JR |
WIEGOLD JR | RICHARD | MARTIN | NA | MALE | JR |
WILLIAMSON JR | SOLOMAN | J | NA | MALE | JR |
YOAKUM JR | JC | NA | NA | MALE | JR |
SMITH MD | PATRICIA | ANN | NA | FEMALE | MD |
VAN NAME | MARY | A | NA | FEMALE | NAME |
VAN NAME | NANCY | HIGGINS | NA | FEMALE | NAME |
VAN NAME | CHRISTOPHER | PAUL | NA | MALE | NAME |
VAN NAME | GARY | GEORGE | NA | MALE | NAME |
VAN NAME | MARK | L | NA | MALE | NAME |
BRAKE SR ESS | CAROLYN | G | NA | FEMALE | SR |
DOSS SR | MICHAEL | RAY | NA | MALE | SR |
HICKS SR | WILFORD | LYTLE | SR. | MALE | SR |
STIMSON SR | RICHARD | BARRETT | NA | MALE | SR |
VAUGHN SR | WALTER | S | NA | MALE | SR |
WHITWORTH SR | RANDY | SEAN | NA | MALE | SR |
V’SOSKE | ERIKA | DONNELL | NA | FEMALE | V |
MOODY V | WILLIE | HOLMES | NA | MALE | V |
TENNENT V | EDWARD | S | NA | MALE | V |
I eyeballed the results and removed words which appeared to be mostly validly used.
Invalid words:
# regular expression to match words
w_regexp <-
c(w_hons, w_gen, w_spec, w_test) %>% # all special words
unique() %>% # make it a set
dplyr::setdiff( # remove words that appear to mostly be validly used
c(
"BISHOP",
"BROTHER",
"DOCTOR",
"ELDER",
"JUNIOR",
"MASTER",
"MISTER",
"PASTOR",
"PROFESSOR"
)
) %>%
glue::glue(x = . , "\\b{x}\\b") %>% # must be words
glue::glue_collapse(sep = "|") # search for any
x <- d %>%
dplyr::mutate(
match =
first_name %>%
stringr::str_to_upper() %>%
stringr::str_replace_all(pattern = "[^ A-Z]", replacement = " ") %>%
stringr::str_squish() %>%
stringr::str_extract(pattern = w_regexp)
) %>%
dplyr::filter(!is.na(match))
nrow(x)
[1] 328
x %>%
dplyr::arrange(match, sex, last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex | match |
---|---|---|---|---|---|
AAL-ANUBIAIMHOTE | DR NGOZI | NA | NA | FEMALE | DR |
WILMES | FATHER | JAMES | NA | MALE | FATHER |
ARMOUR | I ELISABETH | NA | NA | FEMALE | I |
BOYKIN-PRIDE | I’MAN | BRIANN | NA | FEMALE | I |
BRADLEY | I-ASIA | VICTORIA-CHERIS | NA | FEMALE | I |
BRITTON | I-CHI | GUO | NA | FEMALE | I |
BROOME | I SONYA | TIERRA | NA | FEMALE | I |
BULLARD | NANCY I | W | NA | FEMALE | I |
CARLYLE | I | E | NA | FEMALE | I |
CARTER | I JEANNETTE | GUILBE | NA | FEMALE | I |
CHANG | I-WEN | NA | NA | FEMALE | I |
COLEMAN | JANA’I | D | NA | FEMALE | I |
CSEH | MON I | WANG | NA | FEMALE | I |
DESROSIERS | I | DARLENE | NA | FEMALE | I |
DOSHI | I | NA | NA | FEMALE | I |
ERVIN | I MEI | CHOU | NA | FEMALE | I |
GAYE | I COLEEN | M | NA | FEMALE | I |
GLASPIE | I | CHARLOTTE | NA | FEMALE | I |
GREEN | I | A | NA | FEMALE | I |
HALL | I’MESHA | L | NA | FEMALE | I |
HEYWARD | I | CARTER | NA | FEMALE | I |
HU | EDNA I | JEN | NA | FEMALE | I |
HUNEYCUTT | I | SUZANNE EUDY | NA | FEMALE | I |
JAN | I-RAN | HO | NA | FEMALE | I |
KUEHR-MCLAREN | I | WENDY | NA | FEMALE | I |
LANE | I | E MRS | NA | FEMALE | I |
LEWIS | LEISA I | OITERONG | NA | FEMALE | I |
MARTIN | I | MARY | NA | FEMALE | I |
MENG | CHENG-I | C | NA | FEMALE | I |
MOORING | I’RISHA | ORCHA’ | NA | FEMALE | I |
MORRIS | I LANE | NA | NA | FEMALE | I |
MULLIS | LISA I | BELL | NA | FEMALE | I |
NEAL | I | M | NA | FEMALE | I |
PERRY | I | SUN | NA | FEMALE | I |
POPE | I-ASIA | COX | NA | FEMALE | I |
ROSS | I | G MRS | NA | FEMALE | I |
SAUNDERS | VICKI I | SUTTON | NA | FEMALE | I |
SHERWOOD | I-LI | BETH | NA | FEMALE | I |
SIMMONS | I-EESHA | D | NA | FEMALE | I |
SIU | I-MEI | NA | NA | FEMALE | I |
SOUTHERLAND | I | KATHLEEN | NA | FEMALE | I |
SUMMEY | I | V | NA | FEMALE | I |
SUTARIA | DEBORAH I | S | NA | FEMALE | I |
TAI | CHIH-I | NA | NA | FEMALE | I |
TUTTLE | I | BESSIE | NA | FEMALE | I |
WASHINGTON | CAROLINE I | HALEY | NA | FEMALE | I |
WEIR | I | SUN | NA | FEMALE | I |
WILEY | I’AIESHA | SHANTEA | NA | FEMALE | I |
WOOD | I | F | JR | FEMALE | I |
ARNOLD | I | B | NA | MALE | I |
BATES | I | C | NA | MALE | I |
BREWER | I | V | NA | MALE | I |
CALDWELL | I | M | NA | MALE | I |
CHEO | I-DEH | NA | NA | MALE | I |
CLARKE | I | MITCHELL | NA | MALE | I |
COLLEY | I | D | NA | MALE | I |
DOWNS | I V | NA | NA | MALE | I |
EDWARDS | I | J | JR | MALE | I |
FERGUSON | I | M | JR | MALE | I |
FU | I-KONG | BATOR | NA | MALE | I |
GORDON | I | BRYCE | NA | MALE | I |
GUNTER | I | W | NA | MALE | I |
HAIG | I REID | S | NA | MALE | I |
HICKS | I FAISON | NA | NA | MALE | I |
HINES | I | ALAN | NA | MALE | I |
HOOD | I | G | NA | MALE | I |
HOWARD | I | CLARENCE | NA | MALE | I |
JENKINS | I | D | III | MALE | I |
JOHNSON | I | M | NA | MALE | I |
JOHNSTON | I | C | NA | MALE | I |
KELLY | I | PERRY | NA | MALE | I |
KINLAW | I | W | NA | MALE | I |
LAKE | I | BEVERLY | JR | MALE | I |
LITTLE | I | MAYO | JR | MALE | I |
LONGMUIR | I | S | NA | MALE | I |
LYONS | I | CHARLES | NA | MALE | I |
MANESS | I | M | NA | MALE | I |
MCNEIL | I | J | NA | MALE | I |
MILLER | I | J | NA | MALE | I |
MILLER | I | D MCGILVRAY | NA | MALE | I |
PALMER | I | JEREMIAH | NA | MALE | I |
PATTERSON | I | EUGENE | NA | MALE | I |
PAUL | I | B | NA | MALE | I |
PLYLER | I | F | JR | MALE | I |
POPE | I | H | JR | MALE | I |
POWELL | I | HILL | NA | MALE | I |
POWELL | STEVEN I | VANROOY | NA | MALE | I |
QUINN | I | J | NA | MALE | I |
QUINN | I | J | JR | MALE | I |
RUSS | I | V | NA | MALE | I |
SMITH | I | BRUCE | NA | MALE | I |
SMITH | I | MELVIN | NA | MALE | I |
SOLOMON | I | S | NA | MALE | I |
STONE | I | L | NA | MALE | I |
TERRY | I | B | III | MALE | I |
TRAVIS | I | A | NA | MALE | I |
WAKEFIELD | I | NELSON | NA | MALE | I |
WALLACE | I | J | NA | MALE | I |
WARREN | I | NA | NA | MALE | I |
WU | I-CHAN | JOHN | NA | MALE | I |
MANUEL | WALTER III | NA | NA | MALE | III |
MCPHERSON | VAN III | NA | NA | MALE | III |
NASH | SAMUEL III | NA | NA | MALE | III |
PATALANO | LOUIS III | NA | NA | MALE | III |
SCOTT | CALVIN III | NA | NA | MALE | III |
SILVER | III | HAYDEN | NA | MALE | III |
COPELAND | IV | EDWARD JAMES | NA | MALE | IV |
ANDERSON | ELBERT JR | NA | NA | MALE | JR |
BARNEY | LEO JR | NA | NA | MALE | JR |
BOWLES | ROBERT JR | NA | NA | MALE | JR |
BRYANT | FREDDIE JR | NA | NA | MALE | JR |
COLLINS | JACK JR | NA | NA | MALE | JR |
DARRELL | JAMES JR | NA | NA | MALE | JR |
DAVIS | HENRY JR | NA | NA | MALE | JR |
GERTZ | JR | RICHARD | NA | MALE | JR |
HOAGLAND | JR | SANDY | NA | MALE | JR |
HOLLEY | JR | JOHN MARSHAL | NA | MALE | JR |
JONES | JR | MICHAEL | NA | MALE | JR |
JOYNER | JR | EARNEST | NA | MALE | JR |
MCADAMS | WILL,JR | NA | NA | MALE | JR |
MCCLELLAND | ERNEST JR | NA | NA | MALE | JR |
MCCOY | JR | RICHARD TUNN | NA | MALE | JR |
MCIVER | SIM JR | NA | NA | MALE | JR |
MCLEOD | WILLIE JR | NA | NA | MALE | JR |
MULL | MADISON JR | NA | NA | MALE | JR |
PALMS | DONALD JR | NA | NA | MALE | JR |
PEOPLES | LONZO JR | NA | NA | MALE | JR |
ROSADO | ALEJANDRO JR | NA | NA | MALE | JR |
THOMPSON | JOSEPHUS JR | NA | NA | MALE | JR |
TILLMAN | BENNIE JR | NA | NA | MALE | JR |
TOOLE | JR | NA | NA | MALE | JR |
WOODS | HOUSTON JR | NA | NA | MALE | JR |
STOCKELL | MD | COOPER | III | MALE | MD |
SPEIGHT | MISS STEPHANI | RENEE’ | NA | FEMALE | MISS |
FATE | MR | NA | NA | MALE | MR |
KANE | MR | NA | NA | MALE | MR |
BECK | MRS WILLIAM | E | NA | FEMALE | MRS |
BINGMAN | GRAY MRS | NA | NA | FEMALE | MRS |
BURKE | MRS GEORGE | W | NA | FEMALE | MRS |
CARTER | PAUL MRS | NA | JR | FEMALE | MRS |
CHATMAN | MRS H | L | NA | FEMALE | MRS |
COVINGTON | EDNA(MRS | PERRY, JR) | NA | FEMALE | MRS |
CROMER | BETTY MRS | A | NA | FEMALE | MRS |
DAVENPORT | MRS H | T | NA | FEMALE | MRS |
DODSON | RAY MRS | NA | NA | FEMALE | MRS |
EATON | MRS JOHN | C | NA | FEMALE | MRS |
ESTES | ALMA MRS | A | NA | FEMALE | MRS |
FIELDS | MRS G | CLINTON | NA | FEMALE | MRS |
FIELDS | MRS JAMES | C | NA | FEMALE | MRS |
FULP | JAMES MRS | C | NA | FEMALE | MRS |
GIBSON | H MRS | L | NA | FEMALE | MRS |
GOOLSBY | EUGENE MRS | NA | NA | FEMALE | MRS |
GURGANIOUS | JOHN MRS | HALLIE | NA | FEMALE | MRS |
HAMRICK | JOHN R MRS | MARGARET | NA | FEMALE | MRS |
HARRIS | MRS FRED | W | NA | FEMALE | MRS |
HARRIS | MRS P | D | NA | FEMALE | MRS |
HARRIS | MRS WILLIAM | W | NA | FEMALE | MRS |
HARTIS | FRANK E MRS | THAMES | NA | FEMALE | MRS |
HOLLIDAY | MRS JOSEPH | NA | NA | FEMALE | MRS |
JEFFERSON | MRS ATHOL | G | NA | FEMALE | MRS |
JOHNSON | MRS CLYDE | W | NA | FEMALE | MRS |
LAMB | WILSON MRS | C | NA | FEMALE | MRS |
LARIMORE | WILLIAM MRS | NA | NA | FEMALE | MRS |
LUU | MRS | NA | NA | FEMALE | MRS |
MABE | STEVE MRS | NA | NA | FEMALE | MRS |
MARTIN | JAMES MRS | H | NA | FEMALE | MRS |
MASSAGEE | JAMES H MRS | SUE | NA | FEMALE | MRS |
MOODY | MRS WILLARD | W | NA | FEMALE | MRS |
MORGAN | MRS ROY | A | NA | FEMALE | MRS |
NICHOLS | DORIS ( MRS W | NA | NA | FEMALE | MRS |
POPE | MRS O | N | JR | FEMALE | MRS |
REICH | MRS LESTER | G | NA | FEMALE | MRS |
RHONEY | ROBERT MRS | T | NA | FEMALE | MRS |
RIVES | MRS WILBUR | A | NA | FEMALE | MRS |
SCALES | BETTY MRS | H | NA | FEMALE | MRS |
SMITH | MRS WILLIAM JOE | DAVIS | NA | FEMALE | MRS |
TIMMONS | THOMAS MRS | E | NA | FEMALE | MRS |
TRULL | JAMES MRS | T | NA | FEMALE | MRS |
WARD | MARVIN MRS | M | NA | FEMALE | MRS |
WHITE | JOE MRS | MRS | NA | FEMALE | MRS |
WOODLEY | MRS WALLACE | ( RUTH ) | NA | FEMALE | MRS |
QUEEN | GERALDINE(NMN | NA | NA | FEMALE | NMN |
BORDERS | EUGENE(NMN) | NA | NA | MALE | NMN |
FOSTER | OTIS(NMN) | JR | NA | MALE | NMN |
FEATHERSTONE | REV. ROBERT | A | NA | MALE | REV |
GILDEA | SISTER | THERESINE | NA | FEMALE | SISTER |
KELLY | SISTER | ANN | NA | FEMALE | SISTER |
PEGUESE | SISTER | GIRTRUE | NA | FEMALE | SISTER |
ROSS | SISTER | S | NA | FEMALE | SISTER |
TANCRAITOR | SISTER MAXINE | ELIZABETH | NA | FEMALE | SISTER |
DUNTON | JULIAN SR | NA | NA | MALE | SR |
GRAHAM | STEPHEN SR | LEGREE | NA | MALE | SR |
PHILLIPS | SR | DAYLE KELLEY | NA | MALE | SR |
ADAMS | V | JAN | NA | FEMALE | V |
ANDERSON | V | RUTH K | NA | FEMALE | V |
BATKIN | V | MARIA | NA | FEMALE | V |
BENFIELD | RHONDA V | NA | NA | FEMALE | V |
BOWDEN | V | RUTH | NA | FEMALE | V |
BOYD | V | MARIE | NA | FEMALE | V |
BRANDT | V | KATHLEEN GRY | NA | FEMALE | V |
CALHOUN | V | ANNE | NA | FEMALE | V |
CARLAND | V | ANN | NA | FEMALE | V |
CARTER | PAUL V | MRS | NA | FEMALE | V |
CAVENDER | V | DORIS | NA | FEMALE | V |
COOK | INEZ V | CARY | NA | FEMALE | V |
DALBERG | V | ANDREA | NA | FEMALE | V |
DOTY | V’ONA | GILBERT | NA | FEMALE | V |
EDWARDS | V | ERLINE | NA | FEMALE | V |
EVANS-SMITH | V MARIE | HUMPHERY | NA | FEMALE | V |
FINLEY | V | ANNE | NA | FEMALE | V |
FUTRELL | V JEANINE | BOWDEN | NA | FEMALE | V |
GIBBS | V | WILLA | NA | FEMALE | V |
GLENN | V’SHATAVIA | D | NA | FEMALE | V |
HALL | CATHEDRIA V | HOOKER | NA | FEMALE | V |
HALL | V | JUANITA | NA | FEMALE | V |
HAMILTON | V KAYE | NA | NA | FEMALE | V |
JAYANTY | LAKSHMI S V | S | NA | FEMALE | V |
JOHNSON | V | JOLINE | NA | FEMALE | V |
KENNEDY | V0NCIEAL | LEE | NA | FEMALE | V |
KRITES | V | C | NA | FEMALE | V |
LANCASTER | ALDA V | LIMBAUGH | NA | FEMALE | V |
LEE | V | JUANITA | NA | FEMALE | V |
LEE | V | FLORENCE | NA | FEMALE | V |
LYONS | V | BETTIE | NA | FEMALE | V |
MARSHALL | CALLIE V. | LUTZ | NA | FEMALE | V |
MAYBERRY | V | JACQUELINE | NA | FEMALE | V |
MOCK | V CHARLENE | D | NA | FEMALE | V |
MOORMAN | V | E | NA | FEMALE | V |
MORTON | SANDRA V | GOSNELL | NA | FEMALE | V |
OSLEY | V | BONITA NAFZIGER | NA | FEMALE | V |
OWENSBY | V | ANN | NA | FEMALE | V |
PAYNE | V | LUCILLE | NA | FEMALE | V |
PERERA | V | MALLIKA | NA | FEMALE | V |
POWELL | V | ESTELLE | NA | FEMALE | V |
RASH | V | ANDERSON | NA | FEMALE | V |
RAY | V | FRANCIS | NA | FEMALE | V |
SEMONCHE | LAURA V | A | NA | FEMALE | V |
SHAFFER | V LYNNE | STRICKLAND | NA | FEMALE | V |
SHELF | V | S MRS | NA | FEMALE | V |
SMELTZER | V | DIANE | NA | FEMALE | V |
SMITH | V | RAE | NA | FEMALE | V |
STANTON | V | GAYLE | NA | FEMALE | V |
STERLING | V | LEE | NA | FEMALE | V |
STODDARD | V | CHRISTIVE | NA | FEMALE | V |
STREIFF | CONNIE V | R | NA | FEMALE | V |
TEAGUE | V | MICHELLE | NA | FEMALE | V |
TERRY | CAROLYN V | MASK | NA | FEMALE | V |
THOMPSON | V | DELORES | NA | FEMALE | V |
TINNEY | V | LEE W | NA | FEMALE | V |
VANNOY | V | GAIL | NA | FEMALE | V |
WAGGONER | V | C | NA | FEMALE | V |
WALKER | V | FRANCES | NA | FEMALE | V |
WHITE | V CAROLE | NA | NA | FEMALE | V |
WILLIAMS | JACQUELYNE V. | MOORE | NA | FEMALE | V |
WRIGHT | O V | LEDFORD | NA | FEMALE | V |
ADAMS | A V | NA | NA | MALE | V |
ADAMS | V | WAYNE | NA | MALE | V |
ALLEN | V | B | NA | MALE | V |
AVVA | V | SARMA | NA | MALE | V |
BARBOUR | V | KEITH | NA | MALE | V |
BAZEMORE | V | S | NA | MALE | V |
BOWMAN | V | C | NA | MALE | V |
BOYKIN | V | RAYMOND | JR | MALE | V |
CLINE | V | OTHO | JR | MALE | V |
CORRELL | V | C | NA | MALE | V |
DEAL | R V | ROB | NA | MALE | V |
DEHART | V | L | JR | MALE | V |
DREYER | V | DEAN | NA | MALE | V |
GORDON | V | H | NA | MALE | V |
HELTON | V JOHNNY | NA | NA | MALE | V |
HICKS | V | L | NA | MALE | V |
HOLLAND | V | L | NA | MALE | V |
HOLLINSHED | V | E | JR | MALE | V |
HONEYCUTT | V | J | NA | MALE | V |
HOUSEHOLDER | V | R | NA | MALE | V |
IDOL | V | F | NA | MALE | V |
IRAGGI | V | J | NA | MALE | V |
IYER | V V | NA | NA | MALE | V |
JACKSON | V | L | NA | MALE | V |
JEFFRIES | V’GER | S | NA | MALE | V |
JONES | V | W | NA | MALE | V |
KRASNIEWICZ | V | A | NA | MALE | V |
KRITES | V | C | NA | MALE | V |
KRYSTOFIAK | V | L | NA | MALE | V |
LEWIS | V | M | NA | MALE | V |
LIND | V WILLIAM | NA | JR | MALE | V |
LOCKAMY | V | B | NA | MALE | V |
LOMBARDI | V ALAN | NA | NA | MALE | V |
MANGIPUDI | V RAO | NA | NA | MALE | V |
MANN | R V | NA | NA | MALE | V |
MARTIN | V | GRAY | JR | MALE | V |
MATHENY | V | O | JR | MALE | V |
MCKINNEY | V | A | NA | MALE | V |
MODLIN | V | WAYNE | NA | MALE | V |
NORMAN | V WAYNE | NA | NA | MALE | V |
OAKLEY | V | BRADSHER | III | MALE | V |
OATES | A V | NA | NA | MALE | V |
OGLESBY | V | BOYCE | JR | MALE | V |
PFAHL | V KEVIN | NA | NA | MALE | V |
PIERANNUNZI | V PAUL | NA | NA | MALE | V |
PLAYER | V | STEPHEN | NA | MALE | V |
POWELL | V | A | JR | MALE | V |
RASH | A V | NA | NA | MALE | V |
REDMOND | V | PRESTON | JR | MALE | V |
REVELS | V D | NA | NA | MALE | V |
REYNOLDS | V | FRANK | NA | MALE | V |
RUMLEY | V | CLIFTON | NA | MALE | V |
SCALDARA | A V | NA | NA | MALE | V |
SHIELDS | V | E | NA | MALE | V |
SLADE | V | T | NA | MALE | V |
TEMPLE | V | W | NA | MALE | V |
WARD | V | STUART | JR | MALE | V |
WHITE | A V | NA | JR | MALE | V |
WHITSON | V | L | NA | MALE | V |
WOOTEN | V | ALDENE | NA | MALE | V |
WYATT | V | CHARLES | NA | MALE | V |
ANTHONY | VI | JOHNSON | NA | FEMALE | VI |
DO | VI | THUY | NA | FEMALE | VI |
GREENE | VI | HEGE | NA | FEMALE | VI |
HUTCHINSON | VI | THI | NA | FEMALE | VI |
LAI | VI | LE | NA | FEMALE | VI |
NGUYEN | VI | THOAI | NA | FEMALE | VI |
NGUYEN | VI | TUONG | NA | FEMALE | VI |
TOWNSEND | VI | S | NA | FEMALE | VI |
VO | VI | PHUONG | NA | FEMALE | VI |
GALLOWAY | VI CKY | RONALD | NA | MALE | VI |
THAI | VI | KY | NA | MALE | VI |
TRAN | VI | TAN | NA | MALE | VI |
I eyeballed the results and removed words which appeared to be mostly validly used.
Invalid words:
# regular expression to match words
w_regexp <-
c(w_hons, w_gen, w_spec, w_test) %>% # all special words
unique() %>% # make it a set
dplyr::setdiff( # remove words that appear to mostly be validly used
c(
"BISHOP",
"BLIND",
"BR",
"BROTHER",
"DOCTOR",
"ELDER",
"FIRST",
"JR", # invalid & too many to display
"JUNIOR",
"MASTER",
"MISTER",
"MRS", # invalid & too many to display
"NMN", # invalid & too many to display
"PASTOR",
"SENIOR",
"SISTER",
"I",
"V",
"VI",
"VOTER"
)
) %>%
glue::glue(x = . , "\\b{x}\\b") %>% # must be words
glue::glue_collapse(sep = "|") # search for any
x <- d %>%
dplyr::mutate(
match =
midl_name %>%
stringr::str_to_upper() %>%
stringr::str_replace_all(pattern = "[^ A-Z]", replacement = " ") %>%
stringr::str_squish() %>%
stringr::str_extract(pattern = w_regexp)
) %>%
dplyr::filter(!is.na(match))
nrow(x)
[1] 98
x %>%
dplyr::arrange(match, sex, last_name, first_name) %>%
knitr::kable()
last_name | first_name | midl_name | name_sufx_cd | sex | match |
---|---|---|---|---|---|
WISE | DIANA | AKA | NA | FEMALE | AKA |
CACCAMO | KATHLEEN | DR | NA | FEMALE | DR |
DUNCAN | ROSALYN | DR | NA | FEMALE | DR |
GEORGE | AMAY | DR | NA | FEMALE | DR |
VANN | ELLEN | DR | NA | FEMALE | DR |
ELESHA | WILLIAM | DR | NA | MALE | DR |
ROBICSEK | FRANCIS | DR | NA | MALE | DR |
ROPER | THOMAS | E DR | NA | MALE | DR |
VETTER | JOHN | S DR | NA | MALE | DR |
BIRCHFIELD | HARRY | LYNN II | NA | MALE | II |
DINGMAN | LEONARD | ALAN II | NA | MALE | II |
FRADY | ROBERT | GLENN II | NA | MALE | II |
GLOVER | CHARLES | WORTH II | NA | MALE | II |
HAWKINS | ROGER | LARRY II | NA | MALE | II |
HUNTER | ERNEST | II | NA | MALE | II |
KELLY | DAVID | LEE II | NA | MALE | II |
KERR | JAMES | II | NA | MALE | II |
KUHNE | KURT | II | NA | MALE | II |
ROGERS | SYLVESTER | II | SR | MALE | II |
SHERWOOD | GEORGE | ROYALL II | NA | MALE | II |
SOGLUIZZO | JOSEPH | JOHN II | NA | MALE | II |
VAN GORDER | CHARLES | OSCAR II | NA | MALE | II |
WALSTON | CHARLES | EDWARD II | NA | MALE | II |
WATKINS | MONROE | II | NA | MALE | II |
YOUNGMAN | THOMAS | ARDEN II | NA | MALE | II |
BROWN | HARRY | III | NA | MALE | III |
BROWN | MILES | III | NA | MALE | III |
COOPER | DALTON | III | NA | MALE | III |
DAILEY | LANGRA | III | NA | MALE | III |
FUNDERBURK | TRAVIS | III | NA | MALE | III |
GADISON | NATHANIEL | III | NA | MALE | III |
GAY | ROBERT | HENRY, III. | NA | MALE | III |
GEE | LAWRENCE | III | NA | MALE | III |
HARPER | GUS | III | NA | MALE | III |
HOLT | ISAAC | III | NA | MALE | III |
HUMPHREY | ROLAND | M III | NA | MALE | III |
JOHNSON | SHADE | III | NA | MALE | III |
JOYNER | DOUGLAS | III | NA | MALE | III |
LYNCH | ABRAHAM | III | NA | MALE | III |
MCGILVERY | ROBERT | III | NA | MALE | III |
MCILWAIN | FERRY | III | NA | MALE | III |
PHILLIPS | ALEXANDER ROW | III | NA | MALE | III |
PRICE | PAUL | III | III | MALE | III |
STEELE | HARVEY | III | NA | MALE | III |
TERRY | GEORGE | III | NA | MALE | III |
THOMAS | PAUL | III | NA | MALE | III |
BAKER | LOUIS | IV | NA | MALE | IV |
CROSS | EUGENE | IV | NA | MALE | IV |
ESPOSITO | VINCENT | JOHN IV | NA | MALE | IV |
GUNNOE | ROBERT | FELIX IV | NA | MALE | IV |
HORNEY | HARRISON | MARTIN IV | NA | MALE | IV |
HUMBERT | JOHN | LAWRENCE IV | NA | MALE | IV |
BRONSON | JENNIFER | MD | NA | FEMALE | MD |
MCGIMSEY | JAMES | F JR MD | NA | MALE | MD |
BOLES | FAUSTINE | MISS | NA | FEMALE | MISS |
BREEZE | ALMA | EARL MISS | NA | FEMALE | MISS |
DAVIS | JULIA | MISS | NA | FEMALE | MISS |
GARBER | CORNELIA | MISS | NA | FEMALE | MISS |
HAM | MABLE | MISS | NA | FEMALE | MISS |
MCKOY | CAROL | MISS | NA | FEMALE | MISS |
MORRISON | LULA | MISS | NA | FEMALE | MISS |
MOSER | ROSE | MISS | NA | FEMALE | MISS |
PHILSON | CHERYL | MISS | NA | FEMALE | MISS |
ATKINS | DAVID | GLEN MR | NA | MALE | MR |
LIVENGOOD | THURMOND | MS | NA | FEMALE | MS |
STINTZI | MANDI | LY NN | NA | FEMALE | NN |
CRISSMAN | JASON | LY NN | NA | MALE | NN |
GREENE | LESTER | D(NN) | NA | MALE | NN |
LUKER | DANIEL | B(NN) | NA | MALE | NN |
JOHNSON | ROBERT | REV | NA | MALE | REV |
WORKMAN | NATHANIEL | REV | NA | MALE | REV |
ABBAS | MOHAMED | SR | NA | MALE | SR |
ANSTEAD | LENDELL | SR | NA | MALE | SR |
ANTHONY | EVERETT | SR | NA | MALE | SR |
ARMSTON | MILTON | SR | NA | MALE | SR |
ARRINGTON | LEROY | SR | NA | MALE | SR |
BATTLE | NATHANIEL | SR | NA | MALE | SR |
BERRY | RALPH | SR | NA | MALE | SR |
BROWN | NELSON | SR | NA | MALE | SR |
CARTER | FOREST | SR | NA | MALE | SR |
CLARK | JEFFERY | SR | NA | MALE | SR |
DEGRAFFENRIED | EDWARD | (NMN)SR | NA | MALE | SR |
EUBANKS | ALBERT | SR | NA | MALE | SR |
HARRIS | MARION | SR | NA | MALE | SR |
JOHNSON | FRED | ALAN SR | NA | MALE | SR |
JONES | WALTER | SR | NA | MALE | SR |
LANE | LORENZA | SR | NA | MALE | SR |
LUPTON | DENNIS | WAYNE SR | NA | MALE | SR |
LYNCH | LOUIS | SR | NA | MALE | SR |
MILLER | CLARENCE | SR | NA | MALE | SR |
OSBORNE | JOHN | SR | NA | MALE | SR |
PERAGINE | PAUL | SR | NA | MALE | SR |
SELLARS | LARRY | SR | NA | MALE | SR |
STRICKLAND | TIMOTHY | SR | NA | MALE | SR |
WHITAKER | WILLIAM | SR | NA | MALE | SR |
WHITNEY | WILLIAM | PRESTON SR | NA | MALE | SR |
WIGGINS | MINOR | SR | NA | MALE | SR |
WILLIAMS | ERVIN | W., SR., | NA | MALE | SR |
I eyeballed the results and removed words which appeared to be mostly validly used.
Invalid words:
Computation time (excl. render): 461.625 sec elapsed
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] hexbin_1.28.2 glue_1.4.2 knitr_1.30 skimr_2.1.2
[5] fst_0.9.4 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2
[9] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
[13] ggplot2_3.3.3 tidyverse_1.3.0 tictoc_1.0 here_1.0.1
[17] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lattice_0.20-41 lubridate_1.7.9.2 utf8_1.1.4
[5] assertthat_0.2.1 rprojroot_2.0.2 digest_0.6.27 repr_1.1.0
[9] R6_2.5.0 cellranger_1.1.0 backports_1.2.1 reprex_0.3.0
[13] evaluate_0.14 highr_0.8 httr_1.4.2 pillar_1.4.7
[17] rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13 whisker_0.4
[21] rmarkdown_2.6 labeling_0.4.2 munsell_0.5.0 broom_0.7.3
[25] compiler_4.0.3 httpuv_1.5.4 modelr_0.1.8 xfun_0.20
[29] base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.0 tidyselect_1.1.0
[33] bookdown_0.21 fansi_0.4.1 crayon_1.3.4 dbplyr_2.0.0
[37] withr_2.3.0 later_1.1.0.1 grid_4.0.3 jsonlite_1.7.2
[41] gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 git2r_0.28.0
[45] magrittr_2.0.1 scales_1.1.1 cli_2.2.0 stringi_1.5.3
[49] farver_2.0.3 renv_0.12.5 fs_1.5.0 promises_1.1.1
[53] xml2_1.3.2 ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.6
[57] tools_4.0.3 hms_0.5.3 parallel_4.0.3 yaml_2.2.1
[61] colorspace_2.0-0 rvest_0.3.6 haven_2.3.1