I first need to load the required packages and data
library(tidyverse)
library(magrittr)
library(bigrquery)
con <- DBI::dbConnect(drv = bigquery(),
project = "learnclinicaldatascience")
diabetes_notes <- tbl(con, "course4_data.diabetes_notes") %>%
collect()
goldstandard <- tbl(con, "course4_data.diabetes_goldstandard") %>%
collect()
I need to describe my process using the following sections:
I will first explore the structure and contents of one or more of the notes to try to decide on a good approach.
c(diabetes_notes[1,])
## $NOTE_ID
## [1] 1
##
## $NOTE_TYPE
## [1] "History and Physical"
##
## $TEXT
## [1] "CHIEF COMPLAINT: Dog bite to his right lower leg.\n\nHISTORY OF PRESENT ILLNESS: This 50-year-old white male earlier this afternoon was attempting to adjust a cable that a dog was tied to. Dog was a German shepherd, it belonged to his brother, and the dog spontaneously attacked him. He sustained a bite to his right lower leg. Apparently, according to the patient, the dog is well known and is up-to-date on his shots and they wanted to confirm that. The dog has given no prior history of any reason to believe he is not a healthy dog. The patient himself developed a puncture wound with a flap injury. The patient has a flap wound also below the puncture wound, a V-shaped flap, which is pointing towards the foot. It appears to be viable. The wound is open about may be roughly a centimeter in the inside of the flap. He was seen by his medical primary care physician and was given a tetanus shot and the wound was cleaned and wrapped, and then he was referred to us for further assessment.\n\nPAST MEDICAL HISTORY (PMH): Significant for history of pulmonary fibrosis and atrial fibrillation. He is status post bilateral lung transplant back in 2004 because of the pulmonary fibrosis.\n\nALLERGIES: There are no known allergies.\n\nMEDS: Include multiple medications that are significant for his lung transplant including Prograf, CellCept, prednisone, omeprazole, Bactrim which he is on chronically, folic acid, vitamin D, Mag-Ox, Toprol-XL, calcium 500 mg, vitamin B1, Centrum Silver, verapamil, and digoxin.\n\nFAMILY HISTORY Consistent with a sister of his has ovarian cancer and his father had liver cancer. Heart disease in the patient's mother and father, and father also has diabetes and diabetic retinopathy.\n\nSOCIAL HISTORY: He is a non-cigarette smoker. He has occasional glass of wine. He is married. He has one biological child and three stepchildren. He works for ABCD.\n\nROS: He denies any chest pain. He does admit to exertional shortness of breath. He denies any GI or GU problems. He denies any bleeding disorders.\n\nPHYSICAL EXAMINATIONGENERAL: Presents as a well-developed, well-nourished 50-year-old white male who appears to be in mild distress.\n\nHEENT: Unremarkable.\n\nNECK: Supple. There is no mass, adenopathy or bruit.\n\nCHEST: Normal excursion.\n\nLUNGS: Clear to auscultation and percussion.\n\nCOR: Regular. There is no S3 or S4 gallop. There is no obvious murmur.\n\nABDOMEN: Soft. It is nontender. Bowel sounds are present. There is no tenderness.\n\nSKIN: He does have like a Chevron incisional scar across his lower chest and upper abdomen. It appears to be well healed and unremarkable.\n\nGENITALIA: Deferred.\n\nRECTAL: Deferred.\n\nEXTREMITIES: He has about 1+ pitting edema to both legs and they have been present since the surgery. In the right leg, he has an about midway between the right knee and right ankle on the anterior pretibial area, he has a puncture wound that measures about may be centimeter around that appears to be relatively clean, and just below that about may be 3 cm below, he has a flap traumatic injury that measures about may be 4 cm to the point of the flap. The wound is spread apart about may be a centimeter all along that area and it is relatively clean. There was some bleeding when I removed the dressing and we were able to pretty much control that with pressure and some silver nitrate. There were exposed subcutaneous tissues, but there was no exposed tendons that we could see, etc. The flap appeared to be viable.\n\nNEUROLOGIC: Without focal deficits. The patient is alert and oriented.\n\nIMPRESSION: A 50-year-old white male with dog bite to his right leg with a history of pulmonary fibrosis, status post bilateral lung transplant several years ago. He is on multiple medications and he is on chronic Bactrim. We are going to also add some fluoroquinolone right now to protect the skin and probably going to obtain an Infectious Disease consult. We will see him back in the office early next week to reassess his wound. He is to keep the wound clean with the moist dressing right now. He may shower several times a day."
So each note has a NOTE_ID, NOTE_TYPE, and TEXT. Because some notes break up the notes by body part it would be difficult to search using note sections. It would require applying a similar search strategy to multiple note sections which could be cumbersome. I will instead utilize keyword window. I will aim to identify information around the keyword(s) of interest.
I will need to define the following:
diabetes_notes %>% nlp_example_datatable()
I can search all note types for signs of diabetic complications but history and physical, operative notes and discharge summaries are all likely to contain relevant information.
First here are the helper functions to extract the windows matching 1 or 2 keywords
extract_text_window <- function(dataframe, keyword, half_window_size) {
dataframe %>%
group_by(NOTE_ID) %>%
mutate(WORDS = TEXT) %>%
separate_rows(WORDS, sep = "[ \n]+") %>%
mutate(INDEX = seq(from = 1, to = n(), by = 1.0),
WINDOW_START = case_when(INDEX - half_window_size < 1 ~ 1,
TRUE ~ INDEX - half_window_size),
WINDOW_END = case_when(INDEX + half_window_size > max(INDEX) ~ max(INDEX),
TRUE ~ INDEX + half_window_size),
WINDOW = word(string = TEXT, start = WINDOW_START, end = WINDOW_END, sep = "[ \n]+")) %>%
ungroup() %>%
filter(str_detect(string = WORDS, pattern = regex(keyword, ignore_case = TRUE)))
}
extract_2words_text_window <- function(dataframe, keyword1, keyword2, half_window_size){
dataframe %>%
group_by(NOTE_ID) %>%
mutate(WORDS = TEXT) %>%
separate_rows(WORDS, sep = "[ \n]+") %>%
mutate(INDEX = seq(from = 1, to = n(), by = 1.0),
WINDOW_START = case_when(INDEX - half_window_size < 1 ~ 1,
TRUE ~ INDEX - half_window_size),
WINDOW_END = case_when(INDEX + half_window_size > max(INDEX) ~ max(INDEX),
TRUE ~ INDEX + half_window_size),
WINDOW = word(string = TEXT, start = WINDOW_START, end = WINDOW_END, sep = "[ \n]+")) %>%
ungroup() %>%
filter(str_detect(string = WORDS, pattern = regex(keyword1, ignore_case = TRUE)),
str_detect(string = lead(WORDS), pattern = regex(keyword2, ignore_case = TRUE))) %>%
mutate(WINDOW_END = WINDOW_END + 1)
}
I will try using the following words/phrases
I will likely need a medium sized window around my keywords. I don’t necessarily expect the explanation of the complicaiton to be directly next to the keyword, but it shouldn’t be too far away, so I will begin with a window size of 20 (half window size = 10).
I will be looking for indication of current complications as a result of diabetes. This will include nerve damage like tingling or pain, kidney damage, and retinal blood vessel damage. I will make sure the information is referring to the patient about whom the note is written as well.
I will be looking for the following words:
I will work through my use of a regular expression to catch kidney failure
Here is my regular expression:
(?<[a-zA-Z])(kidney|renal)( disease| failure)(?[a-zA-z])
This expression contains look ahead and look behind groups to make sure other terms are not preceding or following my words. For example, it prevents “adrenal” from matching renal. It then matches either kidney or renal(the medical term for kidney) followed by a space and either disease or failure.
This should catch the following:
Now I will test how well it works:
diabetes_notes %>%
mutate(KIDNEY_FAILURE = case_when(str_detect(string = TEXT, pattern = regex("(?<![a-zA-Z])(kidney|renal)( disease| failure)(?![a-zA-z])", ignore_case = T)) ~ 1,
TRUE ~ 0)) %>%
filter(KIDNEY_FAILURE == 1)
## # A tibble: 19 x 4
## NOTE_ID NOTE_TYPE TEXT KIDNEY_FAILURE
## <int> <chr> <chr> <dbl>
## 1 12 Operative Note "PREOPERATIVE DIAGNOSES1. End-stage … 1
## 2 13 Operative Note "PREOPERATIVE DIAGNOSES1. End-stage … 1
## 3 15 History and Ph… "CHIEF COMPLAINT: Right-sided weaknes… 1
## 4 21 Discharge Summ… "DIAGNOSIS: Refractory anemia that i… 1
## 5 27 History and Ph… "CHIEF COMPLAINT: Penile discharge, … 1
## 6 35 Discharge Summ… "REASON FOR CONSULTATION: Syncope.\n… 1
## 7 41 History and Ph… "REASON FOR VISIT: Acute kidney fail… 1
## 8 42 History and Ph… "HISTORY OF PRESENT ILLNESS: The pat… 1
## 9 48 Discharge Summ… "CHIEF COMPLAINT: Headache and pain i… 1
## 10 55 Discharge Summ… "Chief Complaint: Abdominal pain, nau… 1
## 11 62 History and Ph… "HISTORY OF PRESENT ILLNESS: This is… 1
## 12 69 Discharge Summ… "DIAGNOSES PROBLEMS:1. Orthostatic h… 1
## 13 82 Discharge Summ… "Chief Complaint: Back and hip pain.H… 1
## 14 108 Discharge Summ… "ADMISSION DIAGNOSIS: End-stage rena… 1
## 15 109 History and Ph… "REASON FOR CONSULTATION: Abnormal c… 1
## 16 112 History and Ph… "REASON FOR CONSULTATION: Renal fail… 1
## 17 120 History and Ph… "SUBJECTIVE: The patient is in compl… 1
## 18 123 History and Ph… "REASON FOR CONSULTATION: Management… 1
## 19 140 History and Ph… "HISTORY OF PRESENT ILLNESS: This 66… 1
This regex hits 19 notes. Let’s look at the contents of some of the notes.
diabetes_notes %>%
mutate(KIDNEY_FAILURE = case_when(str_detect(string = TEXT, pattern = regex("(?<![a-zA-Z])(kidney|renal)( disease| failure)(?![a-zA-z])", ignore_case = T)) ~ 1,
TRUE ~ 0)) %>%
filter(KIDNEY_FAILURE == 1) %>% nlp_example_datatable()
I do notice that one of the notes includes negation: “He denies any comorbid complications of the diabetes including kidney disease,…”. I will have to check to remove this window.
Now I will try using the 2 keywords window extract.
diabetes_notes %>% extract_2words_text_window(keyword1 = "(?<![a-zA-Z])(kidney|renal)(?![a-zA-z])",
keyword2 = "(?<![a-zA-Z])(disease|failure)(?![a-zA-z])",
half_window_size = 10)
## # A tibble: 41 x 8
## NOTE_ID NOTE_TYPE TEXT WORDS INDEX WINDOW_START WINDOW_END WINDOW
## <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
## 1 12 Operative… "PREOPERA… renal 4 1 15 "PREOPERAT…
## 2 12 Operative… "PREOPERA… renal 12 2 23 "DIAGNOSES…
## 3 12 Operative… "PREOPERA… renal 27 17 38 "right bra…
## 4 13 Operative… "PREOPERA… renal 4 1 15 "PREOPERAT…
## 5 13 Operative… "PREOPERA… renal 20 10 31 "chronic a…
## 6 13 Operative… "PREOPERA… renal 61 51 72 "Michael C…
## 7 15 History a… "CHIEF CO… kidn… 250 240 261 "moderate …
## 8 21 Discharge… "DIAGNOSI… kidn… 40 30 51 "diabetes.…
## 9 27 History a… "CHIEF CO… renal 175 165 186 "as in the…
## 10 35 Discharge… "REASON F… kidn… 864 854 875 "this.4. …
## # … with 31 more rows
diabetes_notes %>% extract_2words_text_window(keyword1 = "(?<![a-zA-Z])(kidney|renal)(?![a-zA-z])",
keyword2 = "(?<![a-zA-Z])(disease|failure)(?![a-zA-z])",
half_window_size = 10) %>%
nlp_example_datatable()
Good, the extract function found the same 19 notes in 41 total windows.
Now it is time to apply the keyword window text identification to identify patients who have diabetic complications of neuropathy, nephropathy, and/or retinopathy.
To achieve this I will separately extract notes with one of each of the three and then merge them.
neuropathy <- diabetes_notes %>% extract_text_window(keyword = "(?<![a-zA-Z])neuropathy(?![a-zA-z])", half_window_size = 10)
nerve_pain <- diabetes_notes %>% extract_2words_text_window(keyword1 = "(?<![a-zA-Z])(nerve)(?![a-zA-z])",
keyword2 = "(?<![a-zA-Z])(pain)(?![a-zA-z])",
half_window_size = 10)
neuropathy <- rbind(neuropathy, nerve_pain)
neuropathy %>% nlp_example_datatable()
neuropathy %>%
summarise(unique(NOTE_ID))
## # A tibble: 19 x 1
## `unique(NOTE_ID)`
## <int>
## 1 3
## 2 7
## 3 21
## 4 24
## 5 27
## 6 30
## 7 34
## 8 37
## 9 38
## 10 52
## 11 61
## 12 71
## 13 82
## 14 86
## 15 97
## 16 118
## 17 126
## 18 130
## 19 18
We get 19 unique notes and 29 relevant windows.
In note 21 the information that shows negation (denies comorbid complications) lies outside the window. Either need to remove by hand or increase window size.
In note 38 the neuropathy is likely due to a condition that is NOT diabetes. Will also need to exclude this one (thrombocythemia)
After looking through the rest of the notes it is not worth increasing the window size as no other notes has important information right outside the window. Will just keep in mind that this note is incorrectly classified. For now I will just deal with the second case above and exclude the windows that include neuropathy due to thrombocythemia
neuropathy_filtered <- neuropathy %>%
mutate(EXCLUDE = case_when(str_detect(string = TEXT, pattern = regex("(?<![a-zA-Z])thrombocythemia(?![a-zA-z])", ignore_case = T)) ~ 1,
TRUE ~ 0)) %>%
filter(EXCLUDE == 0)
neuropathy_filtered %>%
summarise(unique(NOTE_ID))
## # A tibble: 18 x 1
## `unique(NOTE_ID)`
## <int>
## 1 3
## 2 7
## 3 21
## 4 24
## 5 27
## 6 30
## 7 34
## 8 37
## 9 52
## 10 61
## 11 71
## 12 82
## 13 86
## 14 97
## 15 118
## 16 126
## 17 130
## 18 18
neuropathy_filtered %>% nlp_example_datatable()
Good that filter properly removed note #38.
nephropathy <- diabetes_notes %>% extract_text_window(keyword = "(?<![a-zA-Z])nephropathy(?![a-zA-z])", half_window_size = 10)
kidney_failure <- diabetes_notes %>% extract_2words_text_window(keyword1 = "(?<![a-zA-Z])(kidney|renal)(?![a-zA-z])",
keyword2 = "(?<![a-zA-Z])(disease|failure)(?![a-zA-z])",
half_window_size = 10)
nephropathy <- rbind(nephropathy, kidney_failure)
nephropathy %>%
arrange(NOTE_ID) %>%
summarise(unique(NOTE_ID))
## # A tibble: 21 x 1
## `unique(NOTE_ID)`
## <int>
## 1 6
## 2 12
## 3 13
## 4 15
## 5 21
## 6 27
## 7 35
## 8 41
## 9 42
## 10 48
## # … with 11 more rows
nephropathy %<>% arrange(NOTE_ID)
nephropathy %>% nlp_example_datatable()
This keyword window search yielded 21 unique notes in 51 total windows.
Again with note 21 need to exclude by “denies any comorbid complications”
In note 41 the doctor write that he/she is concerned about the patients use of Chinese herbs which can cause nephritis and thinks it is “more likely that[than] diabetic nephropathy”. Should exclude this one also.
In note 48 acute renal failure was observed in patient NOT due to diabetes but instead from tumor lysis syndrome. Will exclude this as well.
In note 82 includes negation: “Negative for coronary heart disease, hypertension, diabetes, or kidney disease”
Note 109 incorrectly classified. No history of diabetes.
Need to exclude “mother and father were on dialysis”
nephropathy_filtered <- nephropathy %>%
mutate(EXCLUDE = case_when(str_detect(string = TEXT, pattern = regex("denies any comorbid complications", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("more likely that diabetic nephropathy", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("tumor lysis syndrome", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("negative for coronary heart disease, hypertension, diabetes, or kidney disease", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("mother and father were on dialysis", ignore_case = T)) ~ 1,
TRUE ~ 0)) %>%
filter(EXCLUDE == 0)
nephropathy_filtered %>%
summarise(unique(NOTE_ID))
## # A tibble: 16 x 1
## `unique(NOTE_ID)`
## <int>
## 1 6
## 2 12
## 3 13
## 4 15
## 5 27
## 6 35
## 7 42
## 8 51
## 9 55
## 10 62
## 11 69
## 12 108
## 13 109
## 14 120
## 15 123
## 16 140
nephropathy_filtered %>% nlp_example_datatable()
Good removed the 5 notes it should have, so now I have 16 notes (down from 21) and a total of 34 windows.
retinopathy <- diabetes_notes %>% extract_text_window(keyword = "(?<![a-zA-Z])retinopathy(?![a-zA-z])", half_window_size = 10)
retinopathy %>%
summarise(unique(NOTE_ID))
## # A tibble: 5 x 1
## `unique(NOTE_ID)`
## <int>
## 1 1
## 2 21
## 3 86
## 4 94
## 5 136
retinopathy %>% nlp_example_datatable()
This keyword window search identified 5 notes in a total of 6 windows.
In note one (and window 1) states “father also has diabetes and diabetic retinopathy”. I will need to remove this one.
In note 21 there is negation “No retinopathy”.
In note 94 it talks about family history, not the current patient. “strong family history of diabetes and including diabetic complications of retinopathy”
Note 136 also has negation: “does not show any evidence of diabetic retinopathy at this time”
Now I will remove these exclusions.
retinopathy_filtered <- retinopathy %>%
mutate(EXCLUDE = case_when(str_detect(string = TEXT, pattern = regex("father also has diabetes and diabetic retinopathy", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("no retinopathy", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("strong family history of diabetes including diabetic complications", ignore_case = T)) ~ 1,
str_detect(string = TEXT, pattern = regex("does not show any evidence of diabetic retinopathy", ignore_case = T)) ~ 1,
TRUE ~ 0)) %>%
filter(EXCLUDE == 0)
retinopathy_filtered %>%
summarise(unique(NOTE_ID))
## # A tibble: 1 x 1
## `unique(NOTE_ID)`
## <int>
## 1 86
retinopathy_filtered %>% nlp_example_datatable()
Great we have removed all but 1, as was the goal based off removing notes that were negated or referred to a different individual.
I now want to merge the data sets to obtain the notes that are identified as cases for any (or multiple) of the three diabetic complications.
all_complications <- rbind(neuropathy_filtered,
rbind(nephropathy_filtered, retinopathy_filtered))
all_complications %<>%
arrange(NOTE_ID)
all_complications %>%
group_by(NOTE_ID)
## # A tibble: 60 x 9
## # Groups: NOTE_ID [33]
## NOTE_ID NOTE_TYPE TEXT WORDS INDEX WINDOW_START WINDOW_END WINDOW EXCLUDE
## <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 3 Discharge… "CC: … neur… 151 141 161 She al… 0
## 2 6 Operative… "PREO… neph… 48 38 58 The pa… 0
## 3 7 Operative… "S - … Neur… 224 214 234 Planta… 0
## 4 12 Operative… "PREO… Neph… 6 1 16 PREOPE… 0
## 5 12 Operative… "PREO… renal 4 1 15 PREOPE… 0
## 6 12 Operative… "PREO… renal 12 2 23 DIAGNO… 0
## 7 12 Operative… "PREO… renal 27 17 38 right … 0
## 8 13 Operative… "PREO… renal 4 1 15 PREOPE… 0
## 9 13 Operative… "PREO… renal 20 10 31 chroni… 0
## 10 13 Operative… "PREO… renal 61 51 72 Michae… 0
## # … with 50 more rows
all_complications %>% nlp_example_datatable()
In order to compare it to the goldstandard, I need to include which condition was found and if any condition was found
checks <- data.frame(NOTE_ID = integer(),
ANY_DIABETIC_COMPLICATION = integer(),
DIABETIC_NEUROPATHY = integer(),
DIABETIC_NEPHROPATHY = integer(),
DIABETIC_RETINOPATHY = integer())
for (i in 1:nrow(diabetes_notes)) {
id <- diabetes_notes$NOTE_ID[i]
neuro <- ifelse((id %in% neuropathy_filtered$NOTE_ID), 1, 0)
nephro <- ifelse((id %in% nephropathy_filtered$NOTE_ID), 1, 0)
retino <- ifelse((id %in% retinopathy_filtered$NOTE_ID), 1, 0)
any <- ifelse((neuro == 1 | nephro == 1 | retino == 1), 1, 0)
toAdd <- c(NOTE_ID = id,
ANY_DIABETIC_COMPLICATION = any,
DIABETIC_NEUROPATHY = neuro,
DIABETIC_NEPHROPATHY = nephro,
DIABETIC_RETINOPATHY = retino)
checks <- rbind(checks, toAdd)
}
colnames(checks) <- c("NOTE_ID", "ANY_DIABETIC_COMPLICATION", "DIABETIC_NEUROPATHY", "DIABETIC_NEPHROPATHY", "DIABETIC_RETINOPATHY")
checks %<>%
mutate(ANY_MATCH = ifelse(goldstandard$ANY_DIABETIC_COMPLICATION == ANY_DIABETIC_COMPLICATION, 1, 0),
NEUROPATHY_MATCH = ifelse(goldstandard$DIABETIC_NEUROPATHY == DIABETIC_NEUROPATHY, 1, 0),
NEPHROPATHY_MATCH = ifelse(goldstandard$DIABETIC_NEPHROPATHY == DIABETIC_NEPHROPATHY, 1, 0),
RETINOPATHY_MATCH = ifelse(goldstandard$DIABETIC_RETINOPATHY == DIABETIC_RETINOPATHY, 1, 0))
checks_nomatch <- checks %>%
filter(ANY_MATCH == 0)
print(checks_nomatch[1:14, 1:9])
## NOTE_ID ANY_DIABETIC_COMPLICATION DIABETIC_NEUROPATHY DIABETIC_NEPHROPATHY
## 1 3 1 1 0
## 2 14 0 0 0
## 3 15 1 0 1
## 4 16 0 0 0
## 5 21 1 1 0
## 6 35 1 0 1
## 7 55 1 0 1
## 8 69 1 0 1
## 9 82 1 1 0
## 10 85 0 0 0
## 11 109 1 0 1
## 12 120 1 0 1
## 13 123 1 0 1
## 14 135 0 0 0
## DIABETIC_RETINOPATHY ANY_MATCH NEUROPATHY_MATCH NEPHROPATHY_MATCH
## 1 0 0 0 1
## 2 0 0 0 1
## 3 0 0 1 0
## 4 0 0 1 1
## 5 0 0 0 1
## 6 0 0 1 0
## 7 0 0 1 0
## 8 0 0 1 0
## 9 0 0 0 1
## 10 0 0 1 0
## 11 0 0 1 0
## 12 0 0 1 0
## 13 0 0 1 0
## 14 0 0 1 1
## RETINOPATHY_MATCH
## 1 1
## 2 1
## 3 1
## 4 0
## 5 1
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
## 11 1
## 12 1
## 13 1
## 14 0
Overall I identified 33 unique notes with diabetic complications by my alogorithm. After hand review of the notes, I know for sure that 30/33 were correctly classified and 3/33 were incorrectly classified as “cases”.
Of the 141 total notes, I had 14 misclassified notes.
Overall I identified 33 notes with diabetic complications. By hand examining these results I found that 30 of these 33 were correctly identified. After using the goldstandard, I found that 14 were misclassified.
10 total notes were identified incorrectly 4 notes were missed
We will look at NOTE_ID 7 which is an operative note.
all_complications %>%
slice(3) %>% nlp_example_datatable()
This note was able to be identified through the keyword “neuropathy”. It was not excluded due to negation or belonging to another individual. This individual’s medical record clearly states a “history of diabetic neuropathy”.
We will look at note 3 which I identified as having neuropathy but did not
all_complications %>%
slice(1) %>% nlp_example_datatable()
Note 3 (shown above) was also identified in searching for neuropathy, but the information occurrs far outside of the window and would likely only be caught by hand review after reading the entire note. This case of neuropathy is due to a car accident, and not diabetes.
An alternative approach to correcting note 109 would be to first exclude any notes that contains the text “No history of diabetes” in any section except family history. This would remove any patients that do not have diabetes first so that we can guarantee that the individuals identified through these notes at least have diabetes. This will still require more checking that the sections identified in the keyword window search are about the individual in question and not negated, but it could be helpful in removing known non-diabetics.
I also noted that a lot of my misclassifications were in nephropathy. I believe it is because I identified end-stage renal failure, which is not the same as diabetic nephropathy. This was a mistake that would have been better avoided if I had more clinical knowledge or a clinical expert to consult with. Removing this keyword would probably reduce my false positive rate.