wflow

title: “HBA2025_results” author: “Paloma” date: “2025-01-27” output: workflowr::wflow_html editor_options: chunk_output_type: console —

Last updated: 2025-03-09

Checks: 6 1

Knit directory: QUAIL-Mex/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20241009)

The command set.seed(20241009) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 77cc810

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 77cc810. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/Hrs_by_HWISE score.png
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store

Unstaged changes:
    Modified:   analysis/HBA2025_cleaning.Rmd
    Modified:   analysis/tests.Rmd
    Modified:   data/Cleaned_Dataset_Screening_HWISE_PSS_V4.csv
    Modified:   data/Q9-10-11-29-31.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/HBA2025_cleaning.Rmd) and HTML (docs/HBA2025_cleaning.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	9502a37	Paloma	2025-03-09	analyses
html	9502a37	Paloma	2025-03-09	analyses
Rmd	8e33bbc	Paloma	2025-03-08	more vars
html	8e33bbc	Paloma	2025-03-08	more vars
Rmd	77cc174	Paloma	2025-03-08	more plots
html	77cc174	Paloma	2025-03-08	more plots
Rmd	7866aba	Paloma	2025-03-07	newplots
html	7866aba	Paloma	2025-03-07	newplots
Rmd	4c407eb	Paloma	2025-03-07	add question 28
Rmd	0a00a41	Paloma	2025-03-06	reg_analysis 2
html	0a00a41	Paloma	2025-03-06	reg_analysis 2
Rmd	f0811f0	Paloma	2025-03-04	reduced NAs
html	f0811f0	Paloma	2025-03-04	reduced NAs
html	74c067f	Paloma	2025-03-04	Update HBA2025_cleaning.html
Rmd	7e9bbe9	Paloma	2025-03-04	Merge branch ‘main’ of https://github.com/lasisilab/QUAIL-Mex
Rmd	3704a5a	Paloma	2025-03-04	add more vars
html	3704a5a	Paloma	2025-03-04	add more vars
Rmd	16af92d	Tina Lasisi	2025-03-04	Updating files
html	16af92d	Tina Lasisi	2025-03-04	Updating files
Rmd	1b8f52e	Paloma	2025-03-03	cleaning

Introduction

Here you will find the code used to obtain results shown in the annual meeting of the HBA, 2025.

Abstract:

Coping with water insecurity: women’s strategies and emotional responses in Iztapalapa, Mexico City

Water insecurity in urban areas presents distinctive challenges, particularly in marginalized communities. While past studies have documented how households adapt to poor water services, many of these coping strategies come at a significant personal cost. Here we examine the coping strategies and emotional impacts of unreliable water services among 400 women in Iztapalapa, Mexico City. Data were collected through surveys over the Fall of 2022 and Spring of 2023. We assessed household water access, water management practices, and emotional responses to local water services.

Results indicate that during acute water shortages, women can spend extended periods (several hours, or sometimes days) waiting for water trucks. Additionally, 57% of respondents reported feeling frustrated or angry about their water situation, while around 20% experienced family conflicts over water use or community-level conflicts around water management, often involving water vendors or government services.

This study offers one of the first in-depth examinations of how water insecurity specifically affects women in Iztapalapa, a densely populated region of Mexico City with severe water access challenges. The findings highlight the urgent need for policy interventions that address water insecurity with a gender-sensitive approach, recognizing the disproportionate burden placed on women as primary water managers in their households.

Cleaning screening file

# Load necessary libraries
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyr)

# Load the dataset

data_path = "data"

# merged dataset
df = read.csv(file.path(data_path, "01.SCREENING.csv"), stringsAsFactors = FALSE,  na.strings = c("", "N/A", "NA", "pending"))

# Convert all character columns to NA where values match missing patterns
df <- df %>%
  mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))

# Remove duplicate IDs and keep rows with at least 10 non-NA values
df_filtered <- df %>%
  mutate(Non_NA_Count = rowSums(!is.na(.))) %>%
  filter(!duplicated(ID) & Non_NA_Count >= 10) %>%
  select(-Non_NA_Count) # Remove helper column


# Display the filtered dataset
print(dim(df_filtered)) # Show dimensions for confirmation

[1] 385  35

write.csv(df_filtered, file.path(data_path, "Filtered_Screening.csv"), row.names = FALSE)

d <- df_filtered

# Confirm total number of unique participants
cat("Total unique participants:", length(unique(d$ID)), "\n")

Total unique participants: 385

# Identify missing ID numbers in the first and second trip ranges
missing_first_trip <- setdiff(1:204, d$ID)
cat("Missing IDs in the first trip:", missing_first_trip, "\n")

Missing IDs in the first trip: 71 164

missing_second_trip <- setdiff(301:497, d$ID)
cat("Missing IDs in the second trip:", missing_second_trip, "\n")

Missing IDs in the second trip: 346 347 348 349 350 351 352 354 355 356 357 358 360 375

# Count participants within ID ranges
cat("Num of Participants first trip ≤ 250:", sum(d$ID <= 250, na.rm = TRUE), "\n")

Num of Participants first trip ≤ 250: 202

cat("Num of Participants second trip ≥ 250:", sum(d$ID >= 250, na.rm = TRUE), "\n")

Num of Participants second trip ≥ 250: 183

# Confirm total number of rows
cat("Total number of rows:", nrow(d), "\n")

Total number of rows: 385

# Set ID as row names (ensuring IDs are unique and non-NA)
if (!any(duplicated(d$ID)) && !any(is.na(d$ID))) {
    rownames(d) <- as.character(d$ID)
} else {
    warning("Some IDs are duplicated or NA, row names were not set.")
}

# Count and report rows containing NAs
num_na_rows <- sum(!complete.cases(d))
cat("Rows with missing values:", num_na_rows, "\n")

Rows with missing values: 385

# Identify columns that start with "SES" and end with "SC"
ses_sc_columns <- grep("^SES.*SC$", names(d), value = TRUE)

# Count total number of NA values in each SES_SC column
na_counts <- colSums(is.na(d[ses_sc_columns]))

# Count total NA values across all SES_SC columns
total_na_count <- sum(na_counts)

# Print results
cat("Number of NA values per SES_SC column:\n")

Number of NA values per SES_SC column:

print(na_counts)

 SES_EDU_SC SES_BTHR_SC  SES_CAR_SC  SES_INT_SC  SES_WRK_SC SES_BEDR_SC 
         14          16           5           2           2           4

cat("\nTotal NA values across all SES_SC columns:", total_na_count, "\n")


Total NA values across all SES_SC columns: 43

# Check if SES_SC columns exist before proceeding
if (length(ses_sc_columns) > 0) {
  
  # Convert SES_SC columns to numeric (handles factors and character types)
  d[ses_sc_columns] <- lapply(d[ses_sc_columns], function(x) as.numeric(as.character(x)))
  
  # Sum SES_SC columns row-wise, keeping NA if any SES_SC value is missing
  d <- d %>%
    rowwise() %>%
    mutate(SES_SC_Total = if_else(any(is.na(c_across(all_of(ses_sc_columns)))),
                                  NA_real_, 
                                  sum(c_across(all_of(ses_sc_columns)), na.rm = TRUE))) %>%
    ungroup()
  
  # Display confirmation message
  cat("Added SES_SC_Total column, maintaining NA where any SES_SC column has NA.\n")
  
} else {
  cat("No SES_SC columns found in the dataset.\n")
}

Warning in FUN(X[[i]], ...): NAs introduced by coercion
Warning in FUN(X[[i]], ...): NAs introduced by coercion

Added SES_SC_Total column, maintaining NA where any SES_SC column has NA.

# Count the number of NAs in the SES_SC_Total column
num_na_ses_sc_total <- sum(is.na(d$SES_SC_Total))

# Print the count
cat("Number of NA values in SES_SC_Total:", num_na_ses_sc_total, "\n")

Number of NA values in SES_SC_Total: 36

print(d[is.na(d$SES_SC_Total),c("ID",ses_sc_columns)], n=20)

# A tibble: 36 × 7
      ID SES_EDU_SC SES_BTHR_SC SES_CAR_SC SES_INT_SC SES_WRK_SC SES_BEDR_SC
   <int>      <dbl>       <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
 1     8         31          NA         18         31         15          12
 2    11         35          NA          0         31         46          23
 3    12         35          47         18         31         61          NA
 4    18         NA          NA         NA         NA         NA          NA
 5    22         31          NA         NA         NA         NA          NA
 6    57         31          NA          0         31         15           6
 7    58         43          NA          0         31         31          23
 8    81         NA          47          0          0         31          23
 9    85         31          47         18         31         61          NA
10    95         NA          24         18          0         61          12
11   101         NA          47          0         31         46          17
12   103         NA          24         18         31         31          17
13   107         NA          47          0         31         31          12
14   108         NA          24         18         31         31          12
15   109         35          NA          0         31         46          23
16   125         43          NA          0         31         31          23
17   127         73          NA          0         31         61          23
18   156         NA           0          0          0         31           6
19   182         31          NA         18         31         15          17
20   194         23          NA          0          0         15           6
# ℹ 16 more rows

# Calculate the percentage of missing values per column
missing_percentage <- colMeans(is.na(d)) * 100

# Get columns with more than 10% missing data
columns_to_remove <- names(missing_percentage[missing_percentage > 10])
length(columns_to_remove)

[1] 11

dim(d)

[1] 385  36

# Print the selected columns
cat("Columns with more than 10% missing values:\n")

Columns with more than 10% missing values:

print(columns_to_remove)

 [1] "D_LOC"          "D_LOC_CODE"     "D_LOC_INFO"     "HLTH_CPAIN_CAT"
 [5] "HLTH_CDIS_CAT"  "D_HH_SIZE_INFO" "D_EMP_DAYS"     "W_HH_SIZE"     
 [9] "W_STORAGE"      "D_INFO"         "SES_INFO"

d <- d %>% select(-all_of(columns_to_remove))
dim(d)

[1] 385  25

head(d)

# A tibble: 6 × 25
     ID D_YRBR D_AGE D_LOC_TIME HLTH_SMK  HLTH_TTM HLTH_MED HLTH_CPAIN HLTH_CDIS
  <int> <chr>  <int> <chr>      <chr>     <chr>    <chr>    <chr>      <chr>    
1     1 1987      35 35         "No"      No       "No"     No         "Tuve co…
2     2 1990      32 32         "Yes"     No       "Lorata… No         "No"     
3     3 1992      30 8          "No"      No       "No"     No         "No"     
4     4 1982      40 32         "No"      No       "No"     No         "No"     
5     5 1976      46 45         "No"      No       "Yes,om… Yes        "Dolor c…
6     6 1990      32 8          "a veces… no       "no"     No         "no"     
# ℹ 16 more variables: D_HH_SIZE <chr>, D_CHLD <chr>, D_EMP_TYPE <chr>,
#   SES_EDU_CAT <chr>, SES_EDU_SC <dbl>, SES_BTHR_CAT <chr>, SES_BTHR_SC <dbl>,
#   SES_CAR_CAT <chr>, SES_CAR_SC <dbl>, SES_INT_CAT <chr>, SES_INT_SC <dbl>,
#   SES_WRK_CAT <chr>, SES_WRK_SC <dbl>, SES_BEDR_CAT <chr>, SES_BEDR_SC <dbl>,
#   SES_SC_Total <dbl>

Data type transformations

# Function to convert non-numeric values to "other"
convert_to_other <- function(x) {
  ifelse(grepl("^[0-9]+$", x) | x == "other" | is.na(x), x, "other")
}

# Apply transformation from yes and no to 1 and 0, to some columns
selected_columns <- c("D_CHLD", "D_HH_SIZE")
d <- d %>%
  mutate(across(all_of(selected_columns), ~ convert_to_other(as.character(.))))

# Specify the columns where transformation should be applied
columns_to_transform <- c("HLTH_SMK", "HLTH_TTM", "HLTH_CPAIN")  # Replace with actual column names

# Convert "yes" to 1, "no" to 0, and everything else to "other"
d <- d %>%
  mutate(across(all_of(columns_to_transform), ~ case_when(
    . == "Yes" ~ "1",
    . == "No"  ~ "0",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))

# Convert numeric-coded columns to factors where applicable
d <- d %>%
  mutate(across(where(~ all(. %in% c("0", "1", "other", NA_character_))), as.factor))  # Ensure consistency

# Convert variables to numeric when appropriate
## Pending: D_YRBR, D_LOC_TIME, D_CHLD, D_HH_SIZE

print(d)

# A tibble: 385 × 25
      ID D_YRBR D_AGE D_LOC_TIME HLTH_SMK HLTH_TTM HLTH_MED HLTH_CPAIN HLTH_CDIS
   <int> <chr>  <int> <chr>      <fct>    <fct>    <chr>    <fct>      <chr>    
 1     1 1987      35 35         0        0        "No"     0          "Tuve co…
 2     2 1990      32 32         1        0        "Lorata… 0          "No"     
 3     3 1992      30 8          0        0        "No"     0          "No"     
 4     4 1982      40 32         0        0        "No"     0          "No"     
 5     5 1976      46 45         0        0        "Yes,om… 1          "Dolor c…
 6     6 1990      32 8          other    other    "no"     0          "no"     
 7     7 1997      25 <NA>       other    other    "no"     1          "dolor c…
 8     8 1985      37 10         0        other    "no"     0          "No diab…
 9     9 1982      40 40         0        0        "No"     other      "No"     
10    10 1979      49 43         0        0        "No"     0          "No"     
# ℹ 375 more rows
# ℹ 16 more variables: D_HH_SIZE <chr>, D_CHLD <chr>, D_EMP_TYPE <chr>,
#   SES_EDU_CAT <chr>, SES_EDU_SC <dbl>, SES_BTHR_CAT <chr>, SES_BTHR_SC <dbl>,
#   SES_CAR_CAT <chr>, SES_CAR_SC <dbl>, SES_INT_CAT <chr>, SES_INT_SC <dbl>,
#   SES_WRK_CAT <chr>, SES_WRK_SC <dbl>, SES_BEDR_CAT <chr>, SES_BEDR_SC <dbl>,
#   SES_SC_Total <dbl>

summary(d)

       ID         D_YRBR              D_AGE       D_LOC_TIME         HLTH_SMK  
 Min.   :  1   Length:385         Min.   :18.0   Length:385         0    :268  
 1st Qu.: 98   Class :character   1st Qu.:26.0   Class :character   1    : 55  
 Median :195   Mode  :character   Median :32.0   Mode  :character   other: 60  
 Mean   :245                      Mean   :32.1                      NA's :  2  
 3rd Qu.:401                      3rd Qu.:38.0                                 
 Max.   :497                      Max.   :49.0                                 
                                  NA's   :2                                    
  HLTH_TTM     HLTH_MED         HLTH_CPAIN   HLTH_CDIS        
 0    :241   Length:385         0    :281   Length:385        
 other:144   Class :character   1    : 51   Class :character  
             Mode  :character   other: 53   Mode  :character  
                                                              
                                                              
                                                              
                                                              
  D_HH_SIZE            D_CHLD           D_EMP_TYPE        SES_EDU_CAT       
 Length:385         Length:385         Length:385         Length:385        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   SES_EDU_SC     SES_BTHR_CAT        SES_BTHR_SC    SES_CAR_CAT       
 Min.   : 10.00   Length:385         Min.   : 0.00   Length:385        
 1st Qu.: 31.00   Class :character   1st Qu.: 0.00   Class :character  
 Median : 31.00   Mode  :character   Median :24.00   Mode  :character  
 Mean   : 36.34                      Mean   :22.28                     
 3rd Qu.: 43.00                      3rd Qu.:24.00                     
 Max.   :101.00                      Max.   :47.00                     
 NA's   :15                          NA's   :17                        
   SES_CAR_SC     SES_INT_CAT          SES_INT_SC    SES_WRK_CAT       
 Min.   : 0.000   Length:385         Min.   : 0.00   Length:385        
 1st Qu.: 0.000   Class :character   1st Qu.:31.00   Class :character  
 Median : 0.000   Mode  :character   Median :31.00   Mode  :character  
 Mean   : 6.003                      Mean   :24.12                     
 3rd Qu.:18.000                      3rd Qu.:31.00                     
 Max.   :37.000                      Max.   :31.00                     
 NA's   :5                           NA's   :2                         
   SES_WRK_SC    SES_BEDR_CAT        SES_BEDR_SC     SES_SC_Total  
 Min.   : 0.00   Length:385         Min.   : 0.00   Min.   : 25.0  
 1st Qu.:15.00   Class :character   1st Qu.:12.00   1st Qu.:104.0  
 Median :31.00   Mode  :character   Median :12.00   Median :129.0  
 Mean   :31.37                      Mean   :13.27   Mean   :133.1  
 3rd Qu.:46.00                      3rd Qu.:17.00   3rd Qu.:159.0  
 Max.   :61.00                      Max.   :23.00   Max.   :263.0  
 NA's   :2                          NA's   :4       NA's   :36

Select cleaned columns

#Keep columns relevant/cleaned
columns_to_keep2 <- c("ID", "D_YRBR", "D_LOC_TIME", "D_AGE", "D_HH_SIZE", "D_CHLD", "HLTH_SMK", "SES_SC_Total")
d <- d %>% select(all_of(columns_to_keep2))


write.csv(d, file.path(data_path, "00.SCREENING_V2.csv"))

Merge with HWISE data & disease/pain data

file2 <- "02.HWISE_PSS.csv"
file3 <- "Chronic_pain_illness.csv"
df1 <- d
df2 <- read.csv(file.path(data_path, file2), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df3 <- read.csv(file.path(data_path, file3), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))

# Identify the common column for merging
common_column <- "ID"  # Change this if needed

# Ensure the common column is of the same type in all datasets
df1[[common_column]] <- as.character(df1[[common_column]])
df2[[common_column]] <- as.character(df2[[common_column]])
df3[[common_column]] <- as.character(df3[[common_column]])

# Merge datasets sequentially by the common column
merged_df <- df1 %>%
  full_join(df2, by = common_column) %>%
  full_join(df3, by = common_column)

colnames(merged_df)

 [1] "ID"             "D_YRBR"         "D_LOC_TIME"     "D_AGE"         
 [5] "D_HH_SIZE"      "D_CHLD"         "HLTH_SMK"       "SES_SC_Total"  
 [9] "SEASON"         "W_WS_LOC"       "HW_TOTAL"       "HW_WORRY"      
[13] "HW_INTERR"      "HW_CLOTHES"     "HW_PLANS"       "HW_FOOD"       
[17] "HW_HANDS"       "HW_BODY"        "HW_DRINK"       "HW_ANGRY"      
[21] "HW_SLEEP"       "HW_NONE"        "HW_SHAME"       "HW_INFO"       
[25] "PSS1"           "PSS2"           "PSS3"           "PSS4"          
[29] "PSS5"           "PSS6"           "PSS7"           "PSS8"          
[33] "PSS9"           "PSS10"          "PSS11"          "PSS12"         
[37] "PSS13"          "PSS14"          "PSS_TOTAL"      "PSS_INFO"      
[41] "HLTH_CPAIN_CAT" "HLTH_CDIS_CAT"

cols_to_remove<- c("HW_INFO", "PSS_INFO", "PSS_TOTAL", "HW_TOTAL")

merged_df <- merged_df %>% select(-all_of(cols_to_remove))


##### Calculate final HW score
# Identify columns that start with "HW_"
hw_columns <- grep("^HW_", names(merged_df), value = TRUE)

# Sum up all HW_ columns row-wise, keeping NA if any HW_ column has NA
merged_df <- merged_df %>%
  rowwise() %>%
  mutate(HW_TOTAL = if_else(any(is.na(c_across(all_of(hw_columns)))),
                            NA_real_, 
                            sum(c_across(all_of(hw_columns)), 
                                na.rm = TRUE))) %>%
  ungroup()


# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")

Merged dataset dimensions: 400 39

print(merged_df, n=20)

# A tibble: 400 × 39
   ID    D_YRBR D_LOC_TIME D_AGE D_HH_SIZE D_CHLD HLTH_SMK SES_SC_Total SEASON 
   <chr> <chr>  <chr>      <int> <chr>     <chr>  <fct>           <dbl> <chr>  
 1 1     1987   35            35 4         0      0                 149 Oct-Dec
 2 2     1990   32            32 12        2      1                 196 Oct-Dec
 3 3     1992   8             30 7         2      0                  52 Oct-Dec
 4 4     1982   32            40 4         1      0                 214 Oct-Dec
 5 5     1976   45            46 4         other  0                 117 Oct-Dec
 6 6     1990   8             32 6         1      other             220 Oct-Dec
 7 7     1997   <NA>          25 4         2      other             130 Oct-Dec
 8 8     1985   10            37 4         2      0                  NA Oct-Dec
 9 9     1982   40            40 6         4      0                  71 Oct-Dec
10 10    1979   43            49 3         1      0                 117 Oct-Dec
11 11    1992   21            30 10        2      0                  NA Oct-Dec
12 12    1996   26            26 17        2      0                  NA Oct-Dec
13 13    1998   24            24 6         2      0                 159 Oct-Dec
14 14    2000   22            22 6         0      0                 132 Oct-Dec
15 15    1997   13            25 5         1      1                 113 Oct-Dec
16 16    1990   14            32 5         other  0                 104 Oct-Dec
17 17    1988   34            34 6         3      0                 127 Oct-Dec
18 18    1978   <NA>          44 <NA>      <NA>   <NA>               NA Oct-Dec
19 19    1991   10            31 4         2      0                 110 Oct-Dec
20 20    1992   11            30 10        2      0                 199 Oct-Dec
# ℹ 380 more rows
# ℹ 30 more variables: W_WS_LOC <chr>, HW_WORRY <int>, HW_INTERR <int>,
#   HW_CLOTHES <int>, HW_PLANS <int>, HW_FOOD <int>, HW_HANDS <int>,
#   HW_BODY <int>, HW_DRINK <int>, HW_ANGRY <int>, HW_SLEEP <int>,
#   HW_NONE <int>, HW_SHAME <int>, PSS1 <int>, PSS2 <int>, PSS3 <int>,
#   PSS4 <int>, PSS5 <int>, PSS6 <int>, PSS7 <int>, PSS8 <int>, PSS9 <int>,
#   PSS10 <int>, PSS11 <int>, PSS12 <int>, PSS13 <int>, PSS14 <int>, …

# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS.csv'.\n")

Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS.csv'.

Merge dataset with info on hours of water supply

# merged dataset
df1 = read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS.csv"), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 = read.csv(file.path(data_path, "hours_water_supply.csv"), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))


# Merge datasets sequentially by the common column
merged_df <- df1 %>%
  full_join(df2, by = "ID") 

seas <- c("SEASON")
merged_df <- merged_df %>%
  mutate(across(all_of(seas), ~ case_when(
    . == "Oct-Dec" ~ "1",
    . == "Apr-Jun"  ~ "0",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))

#Specify the columns where transformation should be applied
columns_to_transform <- c("W_WC_WI")  

# Convert "intermitent" to 1, "continuous" to 0, and everything else to "other"
merged_df <- merged_df %>%
  mutate(across(all_of(columns_to_transform), ~ case_when(
    . == "Intermitente" ~ "1",
    . == "Continua"  ~ "0",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))

columns_to_transform <- c("W_WS_LOC")
# Convert "WI" to 1, "WS" to 0, and everything else to "other"
merged_df <- merged_df %>%
  mutate(across(all_of(columns_to_transform), ~ case_when(
    . == "WI" ~ "1",
    . == "WS"  ~ "0",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))

# Define the list of variables to convert to numeric
col_to_num <- c("D_YRBR", "D_LOC_TIME", "D_HH_SIZE", "D_CHLD", "HRS_WEEK", "HLTH_SMK", "W_WS_LOC", "SEASON", "HRS_WEEK", "W_WC_WI")

# Convert the specified columns to numeric
merged_df <- merged_df %>%
  mutate(across(all_of(col_to_num), ~as.numeric(as.character(.)), .names = "{.col}"))

Warning: There were 7 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(all_of(col_to_num), ~as.numeric(as.character(.)), .names
  = "{.col}")`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 6 remaining warnings.

# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")

Merged dataset dimensions: 402 41

head(merged_df, 20)

   ID D_YRBR D_LOC_TIME D_AGE D_HH_SIZE D_CHLD HLTH_SMK SES_SC_Total SEASON
1   1   1987         35    35         4      0        0          149      1
2   2   1990         32    32        12      2        1          196      1
3   3   1992          8    30         7      2        0           52      1
4   4   1982         32    40         4      1        0          214      1
5   5   1976         45    46         4     NA        0          117      1
6   6   1990          8    32         6      1       NA          220      1
7   7   1997         NA    25         4      2       NA          130      1
8   8   1985         10    37         4      2        0           NA      1
9   9   1982         40    40         6      4        0           71      1
10 10   1979         43    49         3      1        0          117      1
11 11   1992         21    30        10      2        0           NA      1
12 12   1996         26    26        17      2        0           NA      1
13 13   1998         24    24         6      2        0          159      1
14 14   2000         22    22         6      0        0          132      1
15 15   1997         13    25         5      1        1          113      1
16 16   1990         14    32         5     NA        0          104      1
17 17   1988         34    34         6      3        0          127      1
18 18   1978         NA    44        NA     NA       NA           NA      1
19 19   1991         10    31         4      2        0          110      1
20 20   1992         11    30        10      2        0          199      1
   W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES HW_PLANS HW_FOOD HW_HANDS HW_BODY
1         1        2         0          0        0       0        0       0
2         1        0         0          0        0       0        0       1
3         1        0         0          0        0       0        0       0
4         1        0         0          0        0       0        0       0
5         1        2         1          1        1       1        0       1
6         1        1         1          2        1       3        1       1
7         1        0         1          1        1       1        1       1
8         1        2         2          2        0       0        2       0
9         1        0         2          0        0       0        0       0
10        1        2         2          2        0       0        0       0
11        1        1         0          0        1       1        0       2
12        1        1         1          0        0       0        0       0
13        1        1         1          0        1       0        0       0
14        1        1         2          0        0       0        0       0
15        1        0         0          1        0       0        0       0
16        1        3         2          2        0       1        0       3
17        1        1         1          2        1       0        0       1
18        1        0         1          1        0       0        0       0
19        1        1         1          2        1       0        0       1
20        1        2         2          1        2       1        0       2
   HW_DRINK HW_ANGRY HW_SLEEP HW_NONE HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6
1         0        1        0       1        0    2    3    2    4    3    2
2         0        2        0       2        0    2    2    3    3    3    4
3         0        1        0       0        0    2    2    2    2    2    3
4         0        1        0       0        1    2    3    2    3    3    3
5         0        1        0       3        1    2    2    3    2    2    2
6         0        1        0       2        2    2    2    3    2    3    3
7         0        1        0       1        1    2    2    2    2    3    2
8         0        0        0       0        0    2    1    3    3    3    2
9         0        1        0       0        0    2    3    2    3    2    2
10        0        0        0       0        0    1    2    1    3    3    3
11        0        3        0       3        0    3    3    4    4    2    1
12        0        1        0       1        1    3    4    4    1    2    2
13        0        0        1       1        0    4    3    4    4    2    3
14        0        0        0       0        0    2    2    2    3    3    3
15        0        0        0       0        0    3    3    3    2    2    2
16        0        3        0       3        2    3    3    2    2    1    3
17        0        0        0       0        0    2    0    2    3    3    3
18        1        2        1       1        0    2    1    2    3    2    2
19        0        0        0       2        0    1    3    3    3    2    2
20        1        2        0       0        1    1    2    4    2    1    3
   PSS7 PSS8 PSS9 PSS10 PSS11 PSS12 PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT
1     2    1    2     3     1     4     2     1              0             1
2     3    2    3     2     2     4     3     2              0             0
3     3    2    3     2     2     2     3     2              0             0
4     3    1    3     2     2     2     3     1              0             0
5     3    2    2     1     3     3     2     3              1             0
6     4    1    3     3     2     3     1     2              0             0
7     2    2    2     2     2     2     2     2              1             0
8     3    1    3     3     4     2     2     2              0             1
9     2    3    3     2     2     2     2     2              1             0
10    3    2    3     0     2     3     3     2              0             0
11    2    3    3     1     4     3     4     1              1             0
12    2    2    2     1     4     3     2     4              0             0
13    4    2    1     1     3     3     2     3              0             0
14    3    2    3     3     2     2     3     1              0             0
15    1    2    1     1     3     3     2     3              0             0
16    3    1    2     2     2     3     2     4              0             0
17    3    2    3     2     2     3     3     3              0             0
18    2    3    2     2     3     3     1     2              1             0
19    2    3    2     3     3     3     3     2              1             0
20    3    2    3     2     3     3     2     2              0             0
   HW_TOTAL W_WC_WI HRS_WEEK
1         4       1   168.00
2         5       0    35.00
3         1       1       NA
4         2       1    28.00
5        12       1   156.00
6        15       1    12.00
7         9       1    11.00
8         8       1    39.00
9         3       1     5.25
10        6       1    24.50
11       11       1    20.00
12        5       1    77.00
13        5       1    21.00
14        3       1    21.00
15        1       1    84.00
16       19       1    35.00
17        6       1     9.00
18        7       1    17.50
19        8       1     8.00
20       14       1    42.00

summary(merged_df)

       ID            D_YRBR       D_LOC_TIME        D_AGE      
 Min.   :  1.0   Min.   :1976   Min.   : 2.00   Min.   :18.00  
 1st Qu.:101.2   1st Qu.:1984   1st Qu.:13.00   1st Qu.:26.00  
 Median :201.5   Median :1990   Median :23.00   Median :32.00  
 Mean   :248.3   Mean   :1990   Mean   :22.46   Mean   :32.11  
 3rd Qu.:396.8   3rd Qu.:1996   3rd Qu.:31.00   3rd Qu.:38.00  
 Max.   :497.0   Max.   :2005   Max.   :46.00   Max.   :49.00  
                 NA's   :17     NA's   :36      NA's   :18     
   D_HH_SIZE          D_CHLD         HLTH_SMK       SES_SC_Total  
 Min.   : 2.000   Min.   :0.000   Min.   :0.0000   Min.   : 25.0  
 1st Qu.: 4.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:104.2  
 Median : 5.000   Median :2.000   Median :0.0000   Median :129.5  
 Mean   : 5.578   Mean   :1.857   Mean   :0.1698   Mean   :133.2  
 3rd Qu.: 6.000   3rd Qu.:3.000   3rd Qu.:0.0000   3rd Qu.:159.0  
 Max.   :40.000   Max.   :8.000   Max.   :1.0000   Max.   :263.0  
 NA's   :23       NA's   :24      NA's   :78       NA's   :52     
     SEASON          W_WS_LOC         HW_WORRY      HW_INTERR    
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:1.000  
 Median :1.0000   Median :1.0000   Median :1.00   Median :1.000  
 Mean   :0.5088   Mean   :0.5063   Mean   :1.16   Mean   :1.316  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2.00   3rd Qu.:2.000  
 Max.   :1.0000   Max.   :1.0000   Max.   :3.00   Max.   :3.000  
 NA's   :3        NA's   :3        NA's   :3      NA's   :3      
   HW_CLOTHES        HW_PLANS         HW_FOOD          HW_HANDS     
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1.0000   Median :1.0000   Median :0.0000   Median :0.0000  
 Mean   :0.9874   Mean   :0.8571   Mean   :0.4422   Mean   :0.2682  
 3rd Qu.:2.0000   3rd Qu.:2.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
 Max.   :3.0000   Max.   :3.0000   Max.   :3.0000   Max.   :3.0000  
 NA's   :4        NA's   :3        NA's   :4        NA's   :3       
    HW_BODY          HW_DRINK         HW_ANGRY       HW_SLEEP     
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :1.00   Median :0.0000  
 Mean   :0.6792   Mean   :0.4261   Mean   :1.04   Mean   :0.2638  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:2.00   3rd Qu.:0.0000  
 Max.   :3.0000   Max.   :3.0000   Max.   :3.00   Max.   :2.0000  
 NA's   :3        NA's   :3        NA's   :4      NA's   :4       
    HW_NONE          HW_SHAME           PSS1            PSS2      
 Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:2.000  
 Median :0.0000   Median :0.0000   Median :2.000   Median :2.000  
 Mean   :0.7513   Mean   :0.2197   Mean   :2.111   Mean   :2.219  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :3.0000   Max.   :3.0000   Max.   :4.000   Max.   :4.000  
 NA's   :4        NA's   :6        NA's   :4       NA's   :4      
      PSS3           PSS4            PSS5            PSS6           PSS7      
 Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000  
 1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000  
 Median :3.00   Median :3.000   Median :3.000   Median :3.00   Median :3.000  
 Mean   :2.93   Mean   :2.573   Mean   :2.664   Mean   :2.83   Mean   :2.578  
 3rd Qu.:4.00   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.00   3rd Qu.:3.000  
 Max.   :4.00   Max.   :4.000   Max.   :4.000   Max.   :4.00   Max.   :4.000  
 NA's   :4      NA's   :6       NA's   :3       NA's   :3      NA's   :4      
      PSS8            PSS9           PSS10           PSS11      
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000  
 Median :2.000   Median :3.000   Median :2.000   Median :3.000  
 Mean   :2.133   Mean   :2.653   Mean   :1.992   Mean   :2.565  
 3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
 NA's   :4       NA's   :4       NA's   :4       NA's   :4      
     PSS12           PSS13           PSS14       HLTH_CPAIN_CAT  
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.0000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:0.0000  
 Median :3.000   Median :3.000   Median :2.000   Median :0.0000  
 Mean   :2.957   Mean   :2.547   Mean   :2.168   Mean   :0.2663  
 3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:1.0000  
 Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :1.0000  
 NA's   :4       NA's   :5       NA's   :4       NA's   :4       
 HLTH_CDIS_CAT       HW_TOTAL         W_WC_WI          HRS_WEEK     
 Min.   :0.0000   Min.   : 0.000   Min.   :0.0000   Min.   :  0.00  
 1st Qu.:0.0000   1st Qu.: 3.000   1st Qu.:0.0000   1st Qu.: 21.00  
 Median :0.0000   Median : 8.000   Median :1.0000   Median : 42.00  
 Mean   :0.1646   Mean   : 8.419   Mean   :0.7158   Mean   : 85.13  
 3rd Qu.:0.0000   3rd Qu.:12.000   3rd Qu.:1.0000   3rd Qu.:168.00  
 Max.   :1.0000   Max.   :27.000   Max.   :1.0000   Max.   :168.00  
 NA's   :1        NA's   :11       NA's   :22       NA's   :39

dim(merged_df)

[1] 402  41

# Save the merged dataset
write.csv(merged_df, file.path( "Cleaned_Dataset_Screening_HWISE_PSS_V2.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V2.csv'.\n")

Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V2.csv'.

Merge dataset with Q8, 26, 28

# Load the dataset
file4 <- "Q8-26-28.csv"
df <- read.csv(file.path(data_path, file4), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 <- read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V2.csv"))

# Convert all character columns to NA where values match missing patterns
df <- df %>%
  select(ID, MX8_TRUST, MX28_WQ_COMP, MX26_EM_HHW_TYPE) %>%
  mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))

#remove rows with NAs
df <- na.omit(df)
df <- unique(df)

# Merge datasets sequentially by the common column
merged_df <- df %>%
  full_join(df2, by = "ID") 

# Convert 
merged_df <- merged_df %>%
  mutate(across(all_of("MX8_TRUST"), ~ case_when(
    . == "Yes" ~ "0",
    . == "Neutral"  ~ "1",
    . == "No"  ~ "2",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))


# Convert 
merged_df <- merged_df %>%
  mutate(across(all_of("MX28_WQ_COMP"), ~ case_when(
    . == "Worse" ~ "0",
    . == "Same"  ~ "1",
    . == "Better"  ~ "2",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))

# Convert 
merged_df <- merged_df %>%
  mutate(across(all_of("MX26_EM_HHW_TYPE"), ~ case_when(
    . == "positive" ~ "0",
    . == "negative"  ~ "1",
    !is.na(.)  ~ "other",  # Assign "other" to all non-missing values that are not "yes" or "no"
    TRUE ~ NA_character_
  )))
merged_df$MX8_TRUST <- as.factor(merged_df$MX8_TRUST)
merged_df$MX28_WQ_COMP <- as.factor(merged_df$MX28_WQ_COMP)
merged_df$MX26_EM_HHW_TYPE <- as.factor(merged_df$MX26_EM_HHW_TYPE)



merged_df$PSS_TOTAL = rowSums(merged_df[(25 + c(1,2,3,8,11,12,14))]) - rowSums(merged_df[(25 + c(4,5,6,7,9,10,13))])

# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")

Merged dataset dimensions: 402 45

head(merged_df, 20)

   ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE D_YRBR D_LOC_TIME D_AGE D_HH_SIZE
1   1         2            0                0   1987         35    35         4
2   2         2            2                0   1990         32    32        12
3   3         0            1                0   1992          8    30         7
4   4         2            0                1   1982         32    40         4
5   5         2            2                1   1976         45    46         4
6   6         2            0                1   1990          8    32         6
7   7         2            1                1   1997         NA    25         4
8   8         2            1                1   1985         10    37         4
9   9         2            1                1   1982         40    40         6
10 10         2            0                0   1979         43    49         3
11 11         2            0                1   1992         21    30        10
12 12         2            1                0   1996         26    26        17
13 13         2            1                1   1998         24    24         6
14 15         2            2                0   1997         13    25         5
15 16         2            1                1   1990         14    32         5
16 17         2            0                1   1988         34    34         6
17 18         2            0                1   1978         NA    44        NA
18 19         2            0                1   1991         10    31         4
19 20         2            0                1   1992         11    30        10
20 21         2            1                0   1985         34    37         4
   D_CHLD HLTH_SMK SES_SC_Total SEASON W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES
1       0        0          149      1        1        2         0          0
2       2        1          196      1        1        0         0          0
3       2        0           52      1        1        0         0          0
4       1        0          214      1        1        0         0          0
5      NA        0          117      1        1        2         1          1
6       1       NA          220      1        1        1         1          2
7       2       NA          130      1        1        0         1          1
8       2        0           NA      1        1        2         2          2
9       4        0           71      1        1        0         2          0
10      1        0          117      1        1        2         2          2
11      2        0           NA      1        1        1         0          0
12      2        0           NA      1        1        1         1          0
13      2        0          159      1        1        1         1          0
14      1        1          113      1        1        0         0          1
15     NA        0          104      1        1        3         2          2
16      3        0          127      1        1        1         1          2
17     NA       NA           NA      1        1        0         1          1
18      2        0          110      1        1        1         1          2
19      2        0          199      1        1        2         2          1
20      2        0           86      1        1        0         0          0
   HW_PLANS HW_FOOD HW_HANDS HW_BODY HW_DRINK HW_ANGRY HW_SLEEP HW_NONE
1         0       0        0       0        0        1        0       1
2         0       0        0       1        0        2        0       2
3         0       0        0       0        0        1        0       0
4         0       0        0       0        0        1        0       0
5         1       1        0       1        0        1        0       3
6         1       3        1       1        0        1        0       2
7         1       1        1       1        0        1        0       1
8         0       0        2       0        0        0        0       0
9         0       0        0       0        0        1        0       0
10        0       0        0       0        0        0        0       0
11        1       1        0       2        0        3        0       3
12        0       0        0       0        0        1        0       1
13        1       0        0       0        0        0        1       1
14        0       0        0       0        0        0        0       0
15        0       1        0       3        0        3        0       3
16        1       0        0       1        0        0        0       0
17        0       0        0       0        1        2        1       1
18        1       0        0       1        0        0        0       2
19        2       1        0       2        1        2        0       0
20        0       0        0       0        0        1        0       0
   HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6 PSS7 PSS8 PSS9 PSS10 PSS11 PSS12
1         0    2    3    2    4    3    2    2    1    2     3     1     4
2         0    2    2    3    3    3    4    3    2    3     2     2     4
3         0    2    2    2    2    2    3    3    2    3     2     2     2
4         1    2    3    2    3    3    3    3    1    3     2     2     2
5         1    2    2    3    2    2    2    3    2    2     1     3     3
6         2    2    2    3    2    3    3    4    1    3     3     2     3
7         1    2    2    2    2    3    2    2    2    2     2     2     2
8         0    2    1    3    3    3    2    3    1    3     3     4     2
9         0    2    3    2    3    2    2    2    3    3     2     2     2
10        0    1    2    1    3    3    3    3    2    3     0     2     3
11        0    3    3    4    4    2    1    2    3    3     1     4     3
12        1    3    4    4    1    2    2    2    2    2     1     4     3
13        0    4    3    4    4    2    3    4    2    1     1     3     3
14        0    3    3    3    2    2    2    1    2    1     1     3     3
15        2    3    3    2    2    1    3    3    1    2     2     2     3
16        0    2    0    2    3    3    3    3    2    3     2     2     3
17        0    2    1    2    3    2    2    2    3    2     2     3     3
18        0    1    3    3    3    2    2    2    3    2     3     3     3
19        1    1    2    4    2    1    3    3    2    3     2     3     3
20        0    2    2    2    3    2    4    3    2    3     2     2     2
   PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT HW_TOTAL W_WC_WI HRS_WEEK PSS_TOTAL
1      2     1              0             1        4       1   168.00        -4
2      3     2              0             0        5       0    35.00        -4
3      3     2              0             0        1       1       NA        -4
4      3     1              0             0        2       1    28.00        -7
5      2     3              1             0       12       1   156.00         4
6      1     2              0             0       15       1    12.00        -4
7      2     2              1             0        9       1    11.00        -1
8      2     2              0             1        8       1    39.00        -4
9      2     2              1             0        3       1     5.25         0
10     3     2              0             0        6       1    24.50        -5
11     4     1              1             0       11       1    20.00         4
12     2     4              0             0        5       1    77.00        12
13     2     3              0             0        5       1    21.00         5
14     2     3              0             0        1       1    84.00         9
15     2     4              0             0       19       1    35.00         3
16     3     3              0             0        6       1     9.00        -6
17     1     2              1             0        7       1    17.50         2
18     3     2              1             0        8       1     8.00         1
19     2     2              0             0       14       1    42.00         1
20     2     2              0             0        1       1    21.00        -5

summary(merged_df)

       ID        MX8_TRUST   MX28_WQ_COMP MX26_EM_HHW_TYPE     D_YRBR    
 Min.   :  1.0   0    : 72   0   :144     0    :130        Min.   :1976  
 1st Qu.:101.2   1    :  5   1   :157     1    :260        1st Qu.:1984  
 Median :201.5   2    :313   2   : 92     other:  3        Median :1990  
 Mean   :248.3   other:  3   NA's:  9     NA's :  9        Mean   :1990  
 3rd Qu.:396.8   NA's :  9                                 3rd Qu.:1996  
 Max.   :497.0                                             Max.   :2005  
                                                           NA's   :17    
   D_LOC_TIME        D_AGE         D_HH_SIZE          D_CHLD     
 Min.   : 2.00   Min.   :18.00   Min.   : 2.000   Min.   :0.000  
 1st Qu.:13.00   1st Qu.:26.00   1st Qu.: 4.000   1st Qu.:1.000  
 Median :23.00   Median :32.00   Median : 5.000   Median :2.000  
 Mean   :22.46   Mean   :32.11   Mean   : 5.578   Mean   :1.857  
 3rd Qu.:31.00   3rd Qu.:38.00   3rd Qu.: 6.000   3rd Qu.:3.000  
 Max.   :46.00   Max.   :49.00   Max.   :40.000   Max.   :8.000  
 NA's   :36      NA's   :18      NA's   :23       NA's   :24     
    HLTH_SMK       SES_SC_Total       SEASON          W_WS_LOC     
 Min.   :0.0000   Min.   : 25.0   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:104.2   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :129.5   Median :1.0000   Median :1.0000  
 Mean   :0.1698   Mean   :133.2   Mean   :0.5088   Mean   :0.5063  
 3rd Qu.:0.0000   3rd Qu.:159.0   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :263.0   Max.   :1.0000   Max.   :1.0000  
 NA's   :78       NA's   :52      NA's   :3        NA's   :3       
    HW_WORRY      HW_INTERR       HW_CLOTHES        HW_PLANS     
 Min.   :0.00   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.00   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1.00   Median :1.000   Median :1.0000   Median :1.0000  
 Mean   :1.16   Mean   :1.316   Mean   :0.9874   Mean   :0.8571  
 3rd Qu.:2.00   3rd Qu.:2.000   3rd Qu.:2.0000   3rd Qu.:2.0000  
 Max.   :3.00   Max.   :3.000   Max.   :3.0000   Max.   :3.0000  
 NA's   :3      NA's   :3       NA's   :4        NA's   :3       
    HW_FOOD          HW_HANDS         HW_BODY          HW_DRINK     
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.4422   Mean   :0.2682   Mean   :0.6792   Mean   :0.4261  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :3.0000   Max.   :3.0000   Max.   :3.0000   Max.   :3.0000  
 NA's   :4        NA's   :3        NA's   :3        NA's   :3       
    HW_ANGRY       HW_SLEEP         HW_NONE          HW_SHAME     
 Min.   :0.00   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1.00   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :1.04   Mean   :0.2638   Mean   :0.7513   Mean   :0.2197  
 3rd Qu.:2.00   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
 Max.   :3.00   Max.   :2.0000   Max.   :3.0000   Max.   :3.0000  
 NA's   :4      NA's   :4        NA's   :4        NA's   :6       
      PSS1            PSS2            PSS3           PSS4            PSS5      
 Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :3.00   Median :3.000   Median :3.000  
 Mean   :2.111   Mean   :2.219   Mean   :2.93   Mean   :2.573   Mean   :2.664  
 3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.00   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.000   Max.   :4.000   Max.   :4.00   Max.   :4.000   Max.   :4.000  
 NA's   :4       NA's   :4       NA's   :4      NA's   :6       NA's   :3      
      PSS6           PSS7            PSS8            PSS9           PSS10      
 Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
 Median :3.00   Median :3.000   Median :2.000   Median :3.000   Median :2.000  
 Mean   :2.83   Mean   :2.578   Mean   :2.133   Mean   :2.653   Mean   :1.992  
 3rd Qu.:4.00   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.00   Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
 NA's   :3      NA's   :4       NA's   :4       NA's   :4       NA's   :4      
     PSS11           PSS12           PSS13           PSS14      
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000  
 Median :3.000   Median :3.000   Median :3.000   Median :2.000  
 Mean   :2.565   Mean   :2.957   Mean   :2.547   Mean   :2.168  
 3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
 NA's   :4       NA's   :4       NA's   :5       NA's   :4      
 HLTH_CPAIN_CAT   HLTH_CDIS_CAT       HW_TOTAL         W_WC_WI      
 Min.   :0.0000   Min.   :0.0000   Min.   : 0.000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 3.000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median : 8.000   Median :1.0000  
 Mean   :0.2663   Mean   :0.1646   Mean   : 8.419   Mean   :0.7158  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:12.000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :27.000   Max.   :1.0000  
 NA's   :4        NA's   :1        NA's   :11       NA's   :22      
    HRS_WEEK        PSS_TOTAL       
 Min.   :  0.00   Min.   :-19.0000  
 1st Qu.: 21.00   1st Qu.: -6.0000  
 Median : 42.00   Median :  0.0000  
 Mean   : 85.13   Mean   : -0.7215  
 3rd Qu.:168.00   3rd Qu.:  4.0000  
 Max.   :168.00   Max.   : 19.0000  
 NA's   :39       NA's   :7

dim(merged_df)

[1] 402  45

# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V3.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V3.csv'.\n")

Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V3.csv'.

Do not use V4 - format is not usable

# Load the dataset
file5 <- "Q9-10-11-29-31.csv"
df <- read.csv(file.path(data_path, file5), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 <- read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V3.csv"))

summary(df)

       ID         MX9_DRINK_W        MX9_DRINK_BOTPUR   MX9_DRINK_BOT     
 Min.   :  1.00   Length:396         Length:396         Length:396        
 1st Qu.: 99.75   Class :character   Class :character   Class :character  
 Median :198.50   Mode  :character   Mode  :character   Mode  :character  
 Mean   :245.05                                                           
 3rd Qu.:393.25                                                           
 Max.   :492.00                                                           
 MX10_WSTORAGE       MX11_WCOST        MX11_WCOST_2MNTHS  MX11_WCOST_2MNTHS_NUM
 Length:396         Length:396         Length:396         Length:396           
 Class :character   Class :character   Class :character   Class :character     
 Mode  :character   Mode  :character   Mode  :character   Mode  :character     
                                                                               
                                                                               
                                                                               
 MX11_WCOST_NUL     MX29_IDL_WS        MX29_IDL_WS_CAT    MX29_IDL_WS_LKLY  
 Length:396         Length:396         Length:396         Length:396        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 MX29_IDL_WS_INFO   MX31_WCOST_BOTL   
 Length:396         Length:396        
 Class :character   Class :character  
 Mode  :character   Mode  :character

head(df)

  ID                                                MX9_DRINK_W
1  1                                                       <NA>
2  2 a) agua del suministro publico,E) Garrafon de purificadora
3  3                                E) Garrafon de purificadora
4  4                                E) Garrafon de purificadora
5  5                                E) Garrafon de purificadora
6  6            E) Garrafon de purificadora,G) Botellas de Agua
  MX9_DRINK_BOTPUR MX9_DRINK_BOT
1             <NA>          <NA>
2             <NA>          <NA>
3             <NA>          <NA>
4             <NA>          <NA>
5             <NA>          <NA>
6             <NA>          <NA>
                                      MX10_WSTORAGE MX11_WCOST
1                                              <NA>       <NA>
2                    B) Tinaco,C) Cisterna,D) Tambo       <NA>
3                             B) Tinaco,C) Cisterna       <NA>
4                             B) Tinaco,C) Cisterna       <NA>
5                              C) Cisterna,D) Tambo       <NA>
6 B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa       <NA>
  MX11_WCOST_2MNTHS MX11_WCOST_2MNTHS_NUM MX11_WCOST_NUL MX29_IDL_WS
1              <NA>                  <NA>           <NA>        <NA>
2                 0                     0            yes        <NA>
3                60                    60             no        <NA>
4                 0                     0            yes        <NA>
5                DK                    DK             no        <NA>
6                 0                     0            yes        <NA>
  MX29_IDL_WS_CAT MX29_IDL_WS_LKLY MX29_IDL_WS_INFO MX31_WCOST_BOTL
1            <NA>             <NA>             <NA>            <NA>
2         quality              n/a             <NA>        doubled 
3        quantity               no             <NA>         7--> 15
4        quantity               no             <NA>      no change 
5         quality              n/a             <NA>         8--> 15
6         quality               no             <NA>        11--> 18

# Convert all character columns to NA where values match missing patterns
df <- df %>%
  select(ID, MX9_DRINK_W, MX10_WSTORAGE) %>%
  mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))

#remove rows with NAs
df <- na.omit(df)
df <- unique(df)

dim(df)

[1] 379   3

# Merge datasets sequentially by the common column
merged_df <- df2 %>%
  full_join(df, by = "ID") 

# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")

Merged dataset dimensions: 402 47

head(merged_df, 20)

   ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE D_YRBR D_LOC_TIME D_AGE D_HH_SIZE
1   1         2            0                0   1987         35    35         4
2   2         2            2                0   1990         32    32        12
3   3         0            1                0   1992          8    30         7
4   4         2            0                1   1982         32    40         4
5   5         2            2                1   1976         45    46         4
6   6         2            0                1   1990          8    32         6
7   7         2            1                1   1997         NA    25         4
8   8         2            1                1   1985         10    37         4
9   9         2            1                1   1982         40    40         6
10 10         2            0                0   1979         43    49         3
11 11         2            0                1   1992         21    30        10
12 12         2            1                0   1996         26    26        17
13 13         2            1                1   1998         24    24         6
14 15         2            2                0   1997         13    25         5
15 16         2            1                1   1990         14    32         5
16 17         2            0                1   1988         34    34         6
17 18         2            0                1   1978         NA    44        NA
18 19         2            0                1   1991         10    31         4
19 20         2            0                1   1992         11    30        10
20 21         2            1                0   1985         34    37         4
   D_CHLD HLTH_SMK SES_SC_Total SEASON W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES
1       0        0          149      1        1        2         0          0
2       2        1          196      1        1        0         0          0
3       2        0           52      1        1        0         0          0
4       1        0          214      1        1        0         0          0
5      NA        0          117      1        1        2         1          1
6       1       NA          220      1        1        1         1          2
7       2       NA          130      1        1        0         1          1
8       2        0           NA      1        1        2         2          2
9       4        0           71      1        1        0         2          0
10      1        0          117      1        1        2         2          2
11      2        0           NA      1        1        1         0          0
12      2        0           NA      1        1        1         1          0
13      2        0          159      1        1        1         1          0
14      1        1          113      1        1        0         0          1
15     NA        0          104      1        1        3         2          2
16      3        0          127      1        1        1         1          2
17     NA       NA           NA      1        1        0         1          1
18      2        0          110      1        1        1         1          2
19      2        0          199      1        1        2         2          1
20      2        0           86      1        1        0         0          0
   HW_PLANS HW_FOOD HW_HANDS HW_BODY HW_DRINK HW_ANGRY HW_SLEEP HW_NONE
1         0       0        0       0        0        1        0       1
2         0       0        0       1        0        2        0       2
3         0       0        0       0        0        1        0       0
4         0       0        0       0        0        1        0       0
5         1       1        0       1        0        1        0       3
6         1       3        1       1        0        1        0       2
7         1       1        1       1        0        1        0       1
8         0       0        2       0        0        0        0       0
9         0       0        0       0        0        1        0       0
10        0       0        0       0        0        0        0       0
11        1       1        0       2        0        3        0       3
12        0       0        0       0        0        1        0       1
13        1       0        0       0        0        0        1       1
14        0       0        0       0        0        0        0       0
15        0       1        0       3        0        3        0       3
16        1       0        0       1        0        0        0       0
17        0       0        0       0        1        2        1       1
18        1       0        0       1        0        0        0       2
19        2       1        0       2        1        2        0       0
20        0       0        0       0        0        1        0       0
   HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6 PSS7 PSS8 PSS9 PSS10 PSS11 PSS12
1         0    2    3    2    4    3    2    2    1    2     3     1     4
2         0    2    2    3    3    3    4    3    2    3     2     2     4
3         0    2    2    2    2    2    3    3    2    3     2     2     2
4         1    2    3    2    3    3    3    3    1    3     2     2     2
5         1    2    2    3    2    2    2    3    2    2     1     3     3
6         2    2    2    3    2    3    3    4    1    3     3     2     3
7         1    2    2    2    2    3    2    2    2    2     2     2     2
8         0    2    1    3    3    3    2    3    1    3     3     4     2
9         0    2    3    2    3    2    2    2    3    3     2     2     2
10        0    1    2    1    3    3    3    3    2    3     0     2     3
11        0    3    3    4    4    2    1    2    3    3     1     4     3
12        1    3    4    4    1    2    2    2    2    2     1     4     3
13        0    4    3    4    4    2    3    4    2    1     1     3     3
14        0    3    3    3    2    2    2    1    2    1     1     3     3
15        2    3    3    2    2    1    3    3    1    2     2     2     3
16        0    2    0    2    3    3    3    3    2    3     2     2     3
17        0    2    1    2    3    2    2    2    3    2     2     3     3
18        0    1    3    3    3    2    2    2    3    2     3     3     3
19        1    1    2    4    2    1    3    3    2    3     2     3     3
20        0    2    2    2    3    2    4    3    2    3     2     2     2
   PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT HW_TOTAL W_WC_WI HRS_WEEK PSS_TOTAL
1      2     1              0             1        4       1   168.00        -4
2      3     2              0             0        5       0    35.00        -4
3      3     2              0             0        1       1       NA        -4
4      3     1              0             0        2       1    28.00        -7
5      2     3              1             0       12       1   156.00         4
6      1     2              0             0       15       1    12.00        -4
7      2     2              1             0        9       1    11.00        -1
8      2     2              0             1        8       1    39.00        -4
9      2     2              1             0        3       1     5.25         0
10     3     2              0             0        6       1    24.50        -5
11     4     1              1             0       11       1    20.00         4
12     2     4              0             0        5       1    77.00        12
13     2     3              0             0        5       1    21.00         5
14     2     3              0             0        1       1    84.00         9
15     2     4              0             0       19       1    35.00         3
16     3     3              0             0        6       1     9.00        -6
17     1     2              1             0        7       1    17.50         2
18     3     2              1             0        8       1     8.00         1
19     2     2              0             0       14       1    42.00         1
20     2     2              0             0        1       1    21.00        -5
                                                  MX9_DRINK_W
1                                                        <NA>
2  a) agua del suministro publico,E) Garrafon de purificadora
3                                 E) Garrafon de purificadora
4                                 E) Garrafon de purificadora
5                                 E) Garrafon de purificadora
6             E) Garrafon de purificadora,G) Botellas de Agua
7                                 E) Garrafon de purificadora
8             E) Garrafon de purificadora,G) Botellas de Agua
9  a) agua del suministro publico,E) Garrafon de purificadora
10            E) Garrafon de purificadora,G) Botellas de Agua
11            E) Garrafon de purificadora,G) Botellas de Agua
12                                E) Garrafon de purificadora
13            E) Garrafon de purificadora,G) Botellas de Agua
14                                E) Garrafon de purificadora
15                                E) Garrafon de purificadora
16 a) agua del suministro publico,E) Garrafon de purificadora
17           d) Garrafon de marca,E) Garrafon de purificadora
18            E) Garrafon de purificadora,G) Botellas de Agua
19                                E) Garrafon de purificadora
20                                E) Garrafon de purificadora
                                       MX10_WSTORAGE
1                                               <NA>
2                     B) Tinaco,C) Cisterna,D) Tambo
3                              B) Tinaco,C) Cisterna
4                              B) Tinaco,C) Cisterna
5                               C) Cisterna,D) Tambo
6  B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa
7           B) Tinaco,C) Cisterna,E) Cubeta con tapa
8                                   B) Tinaco,pileta
9  B) Tinaco,C) Cisterna,D) Tambo,E) Cubeta con tapa
10                             B) Tinaco,C) Cisterna
11                    B) Tinaco,C) Cisterna,D) Tambo
12 B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa
13                    B) Tinaco,C) Cisterna,D) Tambo
14                                B) Tinaco,D) Tambo
15                       D) Tambo,E) Cubeta con tapa
16                    B) Tinaco,C) Cisterna,D) Tambo
17                                         B) Tinaco
18                       D) Tambo,F) Cubeta sin tapa
19           C) Cisterna,D) Tambo,F) Cubeta sin tapa
20                                         B) Tinaco

summary(merged_df)

       ID         MX8_TRUST          MX28_WQ_COMP    MX26_EM_HHW_TYPE  
 Min.   :  1.0   Length:402         Min.   :0.0000   Length:402        
 1st Qu.:101.2   Class :character   1st Qu.:0.0000   Class :character  
 Median :201.5   Mode  :character   Median :1.0000   Mode  :character  
 Mean   :248.3                      Mean   :0.8677                     
 3rd Qu.:396.8                      3rd Qu.:1.0000                     
 Max.   :497.0                      Max.   :2.0000                     
                                    NA's   :9                          
     D_YRBR       D_LOC_TIME        D_AGE         D_HH_SIZE     
 Min.   :1976   Min.   : 2.00   Min.   :18.00   Min.   : 2.000  
 1st Qu.:1984   1st Qu.:13.00   1st Qu.:26.00   1st Qu.: 4.000  
 Median :1990   Median :23.00   Median :32.00   Median : 5.000  
 Mean   :1990   Mean   :22.46   Mean   :32.11   Mean   : 5.578  
 3rd Qu.:1996   3rd Qu.:31.00   3rd Qu.:38.00   3rd Qu.: 6.000  
 Max.   :2005   Max.   :46.00   Max.   :49.00   Max.   :40.000  
 NA's   :17     NA's   :36      NA's   :18      NA's   :23      
     D_CHLD         HLTH_SMK       SES_SC_Total       SEASON      
 Min.   :0.000   Min.   :0.0000   Min.   : 25.0   Min.   :0.0000  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:104.2   1st Qu.:0.0000  
 Median :2.000   Median :0.0000   Median :129.5   Median :1.0000  
 Mean   :1.857   Mean   :0.1698   Mean   :133.2   Mean   :0.5088  
 3rd Qu.:3.000   3rd Qu.:0.0000   3rd Qu.:159.0   3rd Qu.:1.0000  
 Max.   :8.000   Max.   :1.0000   Max.   :263.0   Max.   :1.0000  
 NA's   :24      NA's   :78       NA's   :52      NA's   :3       
    W_WS_LOC         HW_WORRY      HW_INTERR       HW_CLOTHES    
 Min.   :0.0000   Min.   :0.00   Min.   :0.000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:1.000   1st Qu.:0.0000  
 Median :1.0000   Median :1.00   Median :1.000   Median :1.0000  
 Mean   :0.5063   Mean   :1.16   Mean   :1.316   Mean   :0.9874  
 3rd Qu.:1.0000   3rd Qu.:2.00   3rd Qu.:2.000   3rd Qu.:2.0000  
 Max.   :1.0000   Max.   :3.00   Max.   :3.000   Max.   :3.0000  
 NA's   :3        NA's   :3      NA's   :3       NA's   :4       
    HW_PLANS         HW_FOOD          HW_HANDS         HW_BODY      
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :1.0000   Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.8571   Mean   :0.4422   Mean   :0.2682   Mean   :0.6792  
 3rd Qu.:2.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
 Max.   :3.0000   Max.   :3.0000   Max.   :3.0000   Max.   :3.0000  
 NA's   :3        NA's   :4        NA's   :3        NA's   :3       
    HW_DRINK         HW_ANGRY       HW_SLEEP         HW_NONE      
 Min.   :0.0000   Min.   :0.00   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :1.00   Median :0.0000   Median :0.0000  
 Mean   :0.4261   Mean   :1.04   Mean   :0.2638   Mean   :0.7513  
 3rd Qu.:1.0000   3rd Qu.:2.00   3rd Qu.:0.0000   3rd Qu.:1.0000  
 Max.   :3.0000   Max.   :3.00   Max.   :2.0000   Max.   :3.0000  
 NA's   :3        NA's   :4      NA's   :4        NA's   :4       
    HW_SHAME           PSS1            PSS2            PSS3     
 Min.   :0.0000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
 1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00  
 Median :0.0000   Median :2.000   Median :2.000   Median :3.00  
 Mean   :0.2197   Mean   :2.111   Mean   :2.219   Mean   :2.93  
 3rd Qu.:0.0000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.00  
 Max.   :3.0000   Max.   :4.000   Max.   :4.000   Max.   :4.00  
 NA's   :6        NA's   :4       NA's   :4       NA's   :4     
      PSS4            PSS5            PSS6           PSS7            PSS8      
 Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.00   1st Qu.:2.000   1st Qu.:2.000  
 Median :3.000   Median :3.000   Median :3.00   Median :3.000   Median :2.000  
 Mean   :2.573   Mean   :2.664   Mean   :2.83   Mean   :2.578   Mean   :2.133  
 3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.00   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.000   Max.   :4.000   Max.   :4.00   Max.   :4.000   Max.   :4.000  
 NA's   :6       NA's   :3       NA's   :3      NA's   :4       NA's   :4      
      PSS9           PSS10           PSS11           PSS12      
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :3.000   Median :2.000   Median :3.000   Median :3.000  
 Mean   :2.653   Mean   :1.992   Mean   :2.565   Mean   :2.957  
 3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:4.000  
 Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :4.000  
 NA's   :4       NA's   :4       NA's   :4       NA's   :4      
     PSS13           PSS14       HLTH_CPAIN_CAT   HLTH_CDIS_CAT   
 Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:2.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :3.000   Median :2.000   Median :0.0000   Median :0.0000  
 Mean   :2.547   Mean   :2.168   Mean   :0.2663   Mean   :0.1646  
 3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:0.0000  
 Max.   :4.000   Max.   :4.000   Max.   :1.0000   Max.   :1.0000  
 NA's   :5       NA's   :4       NA's   :4        NA's   :1       
    HW_TOTAL         W_WC_WI          HRS_WEEK        PSS_TOTAL       
 Min.   : 0.000   Min.   :0.0000   Min.   :  0.00   Min.   :-19.0000  
 1st Qu.: 3.000   1st Qu.:0.0000   1st Qu.: 21.00   1st Qu.: -6.0000  
 Median : 8.000   Median :1.0000   Median : 42.00   Median :  0.0000  
 Mean   : 8.419   Mean   :0.7158   Mean   : 85.13   Mean   : -0.7215  
 3rd Qu.:12.000   3rd Qu.:1.0000   3rd Qu.:168.00   3rd Qu.:  4.0000  
 Max.   :27.000   Max.   :1.0000   Max.   :168.00   Max.   : 19.0000  
 NA's   :11       NA's   :22       NA's   :39       NA's   :7         
 MX9_DRINK_W        MX10_WSTORAGE     
 Length:402         Length:402        
 Class :character   Class :character  
 Mode  :character   Mode  :character

dim(merged_df)

[1] 402  47

# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V4.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V4.csv'.\n")

Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V4.csv'.

sessionInfo()

R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Detroit
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.3.1 dplyr_1.1.4

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.9    compiler_4.4.3    promises_1.3.0    tidyselect_1.2.1 
 [5] Rcpp_1.0.13-1     stringr_1.5.1     git2r_0.35.0      later_1.3.2      
 [9] jquerylib_0.1.4   yaml_2.3.10       fastmap_1.2.0     R6_2.5.1         
[13] generics_0.1.3    workflowr_1.7.1   knitr_1.49        tibble_3.2.1     
[17] rprojroot_2.0.4   bslib_0.8.0       pillar_1.9.0      rlang_1.1.4      
[21] utf8_1.2.4        cachem_1.1.0      stringi_1.8.4     httpuv_1.6.15    
[25] xfun_0.49         fs_1.6.5          sass_0.4.9        cli_3.6.3        
[29] withr_3.0.2       magrittr_2.0.3    digest_0.6.37     rstudioapi_0.17.1
[33] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.1    glue_1.8.0       
[37] whisker_0.4.1     fansi_1.0.6       purrr_1.0.2       rmarkdown_2.29   
[41] tools_4.4.3       pkgconfig_2.0.3   htmltools_0.5.8.1