Last updated: 2025-02-25

Checks: 6 1

Knit directory: QUAIL-Mex/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20241009)

The command set.seed(20241009) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 44474e3

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 44474e3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store

Untracked files:
    Untracked:  HLTH_counts_by_SES.png
    Untracked:  data/c_pain.csv

Unstaged changes:
    Modified:   analysis/Plots_Latinx_research_week.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Plots_Latinx_research_week.Rmd) and HTML (docs/Plots_Latinx_research_week.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	44474e3	Paloma	2025-02-22	Create Plots_Latinx_research_week.Rmd

Data cleaning

#Loading file and setting empty cells as "NAs"
d <- read.csv("./data/01.SCREENING.csv", stringsAsFactors = TRUE, na.strings = c("", " ","NA", "N/A"))


 # extract duplicates and compare values (removed manually for now)
dup <- d$ID[duplicated(d$ID)] 
length(dup) # 38

[1] 138

print(dup)

  [1]   2   3   9  22  24  25  26  27  30  33  44  46  70  75  96 102 103 128
 [19] 146 162 163 163 167 180 190 191 201 202 301 301 302 302 303 304 305 306
 [37] 307 308 309 310 310 311 312 313 314 315 316 317 318 319 320 320 321 322
 [55] 323 323 324 325 326 327 328 329 330 330 331 332 333 334 334 335 336 337
 [73] 338 339 340 341 342 343 344 345 353 353 362 364 365 366 367 368 369 370
 [91] 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388
[109] 389 390 391 393 394 395 402 415 427 427 439 453 463 463 471 476 483 483
[127] 483 485 487 488 489 490 492 492 494 495 496 497

# remove duplicates
d <- d[!duplicated(d$ID),]
# confirm total number of participants
length(unique(d$ID)) # 433 rows, but 394 participants

[1] 399

# What numbers are missing? 
ID <- as.ordered(d$ID)
# first trip
setdiff(1:204, ID)

[1]  71 164

nrow(d[d$ID <=250,])

[1] 202

# second trip
setdiff(301:497, ID)

integer(0)

nrow(d[d$ID >=250,])

[1] 197

nrow(d)

[1] 399

 #make ID number the row names
rownames(d) <- d$ID

# Code NAs
d[d=='NA'] <- NA
d <- d %>% 
    replace_na()

# Count rows with NAs
nrow(d[rowSums(is.na(d)) > 0,]) # 12 rows

[1] 399

nrow(d[!complete.cases(d),]) # 12 rows

[1] 399

#NAs <- d[rowSums(is.na(d))> 0,]
#write.table(NAs, "240301_SES_Age_NAs.csv", row.names=FALSE)

# Select useful information
d <- d %>% 
  select(ID, SES_EDU_SC, SES_BTHR_SC, SES_CAR_SC, SES_INT_SC, SES_WRK_SC, SES_BEDR_SC)

# transform factors to numbers
for (i in c(2:length(d))) {
    d[,i] <- as.numeric(as.character(d[,i]))
}

Warning: NAs introduced by coercion
Warning: NAs introduced by coercion

# Keep rows with no missing data
d <- d[complete.cases(d),] # - 9 (dup)  # total 382 complete cases, total 394
nrow(d)

[1] 349

head(d)

  ID SES_EDU_SC SES_BTHR_SC SES_CAR_SC SES_INT_SC SES_WRK_SC SES_BEDR_SC
1  1         10          24          0         31         61          23
2  2         31          47         18         31         46          23
3  3         31           0          0          0         15           6
4  4         73          47          0         31         46          17
5  5         35          24          0         31         15          12
6  6         73          47          0         31         46          23

#write.csv(d, paste("./cleaned/", date, "_SES_clean.csv", sep=""))


# Calcular total SES score
d$SES_score <- rowSums(d[2:7], na.rm = TRUE)

# SES categories
d$SES <- ifelse(d$SES_score <= 47,"E", 
                ifelse(d$SES_score <= 89, "D-/E", 
                       ifelse(d$SES_score <= 111, "D", 
                              ifelse(d$SES_score <= 135, "C-",
                                     ifelse(d$SES_score <= 165, "C",
                                            ifelse(d$SES_score <= 204, "C+",
                                                   "A/B"))))))
head(d)

  ID SES_EDU_SC SES_BTHR_SC SES_CAR_SC SES_INT_SC SES_WRK_SC SES_BEDR_SC
1  1         10          24          0         31         61          23
2  2         31          47         18         31         46          23
3  3         31           0          0          0         15           6
4  4         73          47          0         31         46          17
5  5         35          24          0         31         15          12
6  6         73          47          0         31         46          23
  SES_score  SES
1       149    C
2       196   C+
3        52 D-/E
4       214  A/B
5       117   C-
6       220  A/B

combining data

#Loading file and setting empty cells as "NAs"
c <- read.csv("./data/Chronic_pain_illness.csv", stringsAsFactors = TRUE, na.strings = c("", " ","NA", "N/A"))

c$chronic <- as.factor(ifelse(c$HLTH_CPAIN_CAT == 1 | c$HLTH_CDIS_CAT == 1, "Yes", "No"))

m <- merge(d,c, by="ID")
dim(m)

[1] 349  12

m[m =='NA'] <- NA
m <- m %>% 
    replace_na()
m <- m %>% 
  drop_na()

m %>%
  select(ID, SES, chronic) %>%
  group_by(SES) %>%
  summarise(n_total = n())

# A tibble: 7 × 2
  SES   n_total
  <chr>   <int>
1 A/B        26
2 C          77
3 C+         48
4 C-         85
5 D          51
6 D-/E       49
7 E          10

agg<- count(m, SES, chronic)
agg2 <- pivot_wider(agg, 
            names_from = chronic,
            values_from = n)
agg2$Total <- agg2$Yes/(agg2$Yes + agg2$No)*100

#png(file= "HLTH_counts_by_SES.png", width = 700, height = 800 )
ggplot(agg2, aes(x = SES, y = Total)) +
    geom_bar(stat= "identity", fill = "orchid4", alpha = 0.9) +
     ylim(0, 55) + 
   theme_minimal()  +
   geom_text(aes(label = paste(round(Total, 1), "%", sep="")), 
            vjust = -0.5, 
            colour = "gray40", 
            size = 9) +
   theme(
    plot.title = element_text(hjust = 0.5, 
                              size = 32, face = "bold"),
    axis.title = element_text(size = 22),  
    axis.text = element_text(size = 26)  
  ) + 
  labs(title="Proportion of participants suffering from \n chronic pain or chronic disease") +
  xlab("Socioeconomic status") +
  ylab("Percent (%)")

#dev.off()

SES and PSS - work in progress

f<-read.csv("./data/02.HWISE_PSS.csv")
# Add more info to file
dim(f)

[1] 398  33

f$ID <- as.factor(f$ID)

# Calculate total score PSS per participant
f$t_pss <- 0
# change positive statements to negative values
pss <- c(20, 21, 22, 23, 25, 26, 29)
for (i in pss) {f[i] <- f[i]*-1 }
    for (i in 1:nrow(f)) { 
    sum(f[i, c(18:31)]) -> f$Total_PSS[i]
}
summary(f$Total_PSS)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
-13.000  -5.000  -2.000  -2.558   0.000   6.000       4

pss <- merge(m,f, by="ID")
dim(pss)

[1] 345  46

pss[pss =='NA'] <- NA



pss %>%
  select(ID, SES, chronic, Total_PSS) %>%
  drop_na(Total_PSS) %>%
  group_by(SES, chronic) %>%
  summarise(n_total = n(), mean = mean(Total_PSS))

`summarise()` has grouped output by 'SES'. You can override using the `.groups`
argument.

# A tibble: 14 × 4
# Groups:   SES [7]
   SES   chronic n_total  mean
   <chr> <fct>     <int> <dbl>
 1 A/B   No           21 -2.43
 2 A/B   Yes           5 -2.2 
 3 C     No           41 -3.24
 4 C     Yes          34 -1.97
 5 C+    No           34 -2.91
 6 C+    Yes          13 -3.69
 7 C-    No           61 -2.61
 8 C-    Yes          23 -2.70
 9 D     No           29 -2.45
10 D     Yes          21 -3.19
11 D-/E  No           33 -1.76
12 D-/E  Yes          16 -3.38
13 E     No            5 -1.4 
14 E     Yes           5 -4.6

agg<- count(pss, SES, chronic, Total_PSS)
agg2 <- pivot_wider(agg, 
            names_from = chronic,
            values_from = n)
agg2$Total <- agg2$Yes/(agg2$Yes + agg2$No)*100

png(file= "HLTH_counts_by_SES.png", width = 700, height = 800 )
ggplot(agg2, aes(x = SES, y = Total)) +
    geom_bar(stat= "identity", fill = "orchid4", alpha = 0.9) +
     ylim(0, 55) + 
   theme_minimal()  +
   geom_text(aes(label = paste(round(Total, 1), "%", sep="")), 
            vjust = -0.5, 
            colour = "gray40", 
            size = 9) +
   theme(
    plot.title = element_text(hjust = 0.5, 
                              size = 32, face = "bold"),
    axis.title = element_text(size = 22),  
    axis.text = element_text(size = 26)  
  ) + 
  labs(title="Proportion of participants suffering from \n chronic pain or chronic disease") +
  xlab("Socioeconomic status") +
  ylab("Percent (%)")

Warning: Removed 89 rows containing missing values or values outside the scale range
(`geom_bar()`).

Warning: Removed 52 rows containing missing values or values outside the scale range
(`geom_text()`).

dev.off()

quartz_off_screen 
                2

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Detroit
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RColorBrewer_1.1-3 viridis_0.6.5      viridisLite_0.4.2  plotly_4.10.4     
 [5] lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4       
 [9] purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1      
[13] ggplot2_3.5.1      tidyverse_2.0.0    rmarkdown_2.29    

loaded via a namespace (and not attached):
 [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    stringi_1.8.4    
 [5] hms_1.1.3         digest_0.6.37     magrittr_2.0.3    timechange_0.3.0 
 [9] evaluate_1.0.1    fastmap_1.2.0     rprojroot_2.0.4   workflowr_1.7.1  
[13] jsonlite_1.8.9    whisker_0.4.1     gridExtra_2.3     promises_1.3.0   
[17] httr_1.4.7        fansi_1.0.6       scales_1.3.0      lazyeval_0.2.2   
[21] jquerylib_0.1.4   cli_3.6.3         rlang_1.1.4       munsell_0.5.1    
[25] withr_3.0.2       cachem_1.1.0      yaml_2.3.10       tools_4.4.2      
[29] tzdb_0.4.0        colorspace_2.1-1  httpuv_1.6.15     vctrs_0.6.5      
[33] R6_2.5.1          lifecycle_1.0.4   git2r_0.35.0      htmlwidgets_1.6.4
[37] fs_1.6.5          pkgconfig_2.0.3   pillar_1.9.0      bslib_0.8.0      
[41] later_1.3.2       gtable_0.3.6      data.table_1.16.2 glue_1.8.0       
[45] Rcpp_1.0.13-1     xfun_0.49         tidyselect_1.2.1  rstudioapi_0.17.1
[49] knitr_1.49        farver_2.1.2      htmltools_0.5.8.1 labeling_0.4.3   
[53] compiler_4.4.2

Plots_Latinx_research_week

Paloma

2025-01-27

Data cleaning

combining data

SES and PSS - work in progress