title: “HBA2025_results” author: “Paloma” date: “2025-01-27” output: workflowr::wflow_html editor_options: chunk_output_type: console —
Last updated: 2025-03-09
Checks: 6 1
Knit directory: QUAIL-Mex/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20241009)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 77cc810. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/Hrs_by_HWISE score.png
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Unstaged changes:
Modified: analysis/HBA2025_cleaning.Rmd
Modified: analysis/tests.Rmd
Modified: data/Cleaned_Dataset_Screening_HWISE_PSS_V4.csv
Modified: data/Q9-10-11-29-31.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/HBA2025_cleaning.Rmd
) and
HTML (docs/HBA2025_cleaning.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 9502a37 | Paloma | 2025-03-09 | analyses |
html | 9502a37 | Paloma | 2025-03-09 | analyses |
Rmd | 8e33bbc | Paloma | 2025-03-08 | more vars |
html | 8e33bbc | Paloma | 2025-03-08 | more vars |
Rmd | 77cc174 | Paloma | 2025-03-08 | more plots |
html | 77cc174 | Paloma | 2025-03-08 | more plots |
Rmd | 7866aba | Paloma | 2025-03-07 | newplots |
html | 7866aba | Paloma | 2025-03-07 | newplots |
Rmd | 4c407eb | Paloma | 2025-03-07 | add question 28 |
Rmd | 0a00a41 | Paloma | 2025-03-06 | reg_analysis 2 |
html | 0a00a41 | Paloma | 2025-03-06 | reg_analysis 2 |
Rmd | f0811f0 | Paloma | 2025-03-04 | reduced NAs |
html | f0811f0 | Paloma | 2025-03-04 | reduced NAs |
html | 74c067f | Paloma | 2025-03-04 | Update HBA2025_cleaning.html |
Rmd | 7e9bbe9 | Paloma | 2025-03-04 | Merge branch ‘main’ of https://github.com/lasisilab/QUAIL-Mex |
Rmd | 3704a5a | Paloma | 2025-03-04 | add more vars |
html | 3704a5a | Paloma | 2025-03-04 | add more vars |
Rmd | 16af92d | Tina Lasisi | 2025-03-04 | Updating files |
html | 16af92d | Tina Lasisi | 2025-03-04 | Updating files |
Rmd | 1b8f52e | Paloma | 2025-03-03 | cleaning |
Here you will find the code used to obtain results shown in the annual meeting of the HBA, 2025.
Abstract:
Coping with water insecurity: women’s strategies and emotional responses in Iztapalapa, Mexico City
Water insecurity in urban areas presents distinctive challenges, particularly in marginalized communities. While past studies have documented how households adapt to poor water services, many of these coping strategies come at a significant personal cost. Here we examine the coping strategies and emotional impacts of unreliable water services among 400 women in Iztapalapa, Mexico City. Data were collected through surveys over the Fall of 2022 and Spring of 2023. We assessed household water access, water management practices, and emotional responses to local water services.
Results indicate that during acute water shortages, women can spend extended periods (several hours, or sometimes days) waiting for water trucks. Additionally, 57% of respondents reported feeling frustrated or angry about their water situation, while around 20% experienced family conflicts over water use or community-level conflicts around water management, often involving water vendors or government services.
This study offers one of the first in-depth examinations of how water insecurity specifically affects women in Iztapalapa, a densely populated region of Mexico City with severe water access challenges. The findings highlight the urgent need for policy interventions that address water insecurity with a gender-sensitive approach, recognizing the disproportionate burden placed on women as primary water managers in their households.
# Load necessary libraries
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)
# Load the dataset
data_path = "data"
# merged dataset
df = read.csv(file.path(data_path, "01.SCREENING.csv"), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
# Convert all character columns to NA where values match missing patterns
df <- df %>%
mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))
# Remove duplicate IDs and keep rows with at least 10 non-NA values
df_filtered <- df %>%
mutate(Non_NA_Count = rowSums(!is.na(.))) %>%
filter(!duplicated(ID) & Non_NA_Count >= 10) %>%
select(-Non_NA_Count) # Remove helper column
# Display the filtered dataset
print(dim(df_filtered)) # Show dimensions for confirmation
[1] 385 35
write.csv(df_filtered, file.path(data_path, "Filtered_Screening.csv"), row.names = FALSE)
d <- df_filtered
# Confirm total number of unique participants
cat("Total unique participants:", length(unique(d$ID)), "\n")
Total unique participants: 385
# Identify missing ID numbers in the first and second trip ranges
missing_first_trip <- setdiff(1:204, d$ID)
cat("Missing IDs in the first trip:", missing_first_trip, "\n")
Missing IDs in the first trip: 71 164
missing_second_trip <- setdiff(301:497, d$ID)
cat("Missing IDs in the second trip:", missing_second_trip, "\n")
Missing IDs in the second trip: 346 347 348 349 350 351 352 354 355 356 357 358 360 375
# Count participants within ID ranges
cat("Num of Participants first trip ≤ 250:", sum(d$ID <= 250, na.rm = TRUE), "\n")
Num of Participants first trip ≤ 250: 202
cat("Num of Participants second trip ≥ 250:", sum(d$ID >= 250, na.rm = TRUE), "\n")
Num of Participants second trip ≥ 250: 183
# Confirm total number of rows
cat("Total number of rows:", nrow(d), "\n")
Total number of rows: 385
# Set ID as row names (ensuring IDs are unique and non-NA)
if (!any(duplicated(d$ID)) && !any(is.na(d$ID))) {
rownames(d) <- as.character(d$ID)
} else {
warning("Some IDs are duplicated or NA, row names were not set.")
}
# Count and report rows containing NAs
num_na_rows <- sum(!complete.cases(d))
cat("Rows with missing values:", num_na_rows, "\n")
Rows with missing values: 385
# Identify columns that start with "SES" and end with "SC"
ses_sc_columns <- grep("^SES.*SC$", names(d), value = TRUE)
# Count total number of NA values in each SES_SC column
na_counts <- colSums(is.na(d[ses_sc_columns]))
# Count total NA values across all SES_SC columns
total_na_count <- sum(na_counts)
# Print results
cat("Number of NA values per SES_SC column:\n")
Number of NA values per SES_SC column:
print(na_counts)
SES_EDU_SC SES_BTHR_SC SES_CAR_SC SES_INT_SC SES_WRK_SC SES_BEDR_SC
14 16 5 2 2 4
cat("\nTotal NA values across all SES_SC columns:", total_na_count, "\n")
Total NA values across all SES_SC columns: 43
# Check if SES_SC columns exist before proceeding
if (length(ses_sc_columns) > 0) {
# Convert SES_SC columns to numeric (handles factors and character types)
d[ses_sc_columns] <- lapply(d[ses_sc_columns], function(x) as.numeric(as.character(x)))
# Sum SES_SC columns row-wise, keeping NA if any SES_SC value is missing
d <- d %>%
rowwise() %>%
mutate(SES_SC_Total = if_else(any(is.na(c_across(all_of(ses_sc_columns)))),
NA_real_,
sum(c_across(all_of(ses_sc_columns)), na.rm = TRUE))) %>%
ungroup()
# Display confirmation message
cat("Added SES_SC_Total column, maintaining NA where any SES_SC column has NA.\n")
} else {
cat("No SES_SC columns found in the dataset.\n")
}
Warning in FUN(X[[i]], ...): NAs introduced by coercion
Warning in FUN(X[[i]], ...): NAs introduced by coercion
Added SES_SC_Total column, maintaining NA where any SES_SC column has NA.
# Count the number of NAs in the SES_SC_Total column
num_na_ses_sc_total <- sum(is.na(d$SES_SC_Total))
# Print the count
cat("Number of NA values in SES_SC_Total:", num_na_ses_sc_total, "\n")
Number of NA values in SES_SC_Total: 36
print(d[is.na(d$SES_SC_Total),c("ID",ses_sc_columns)], n=20)
# A tibble: 36 × 7
ID SES_EDU_SC SES_BTHR_SC SES_CAR_SC SES_INT_SC SES_WRK_SC SES_BEDR_SC
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 8 31 NA 18 31 15 12
2 11 35 NA 0 31 46 23
3 12 35 47 18 31 61 NA
4 18 NA NA NA NA NA NA
5 22 31 NA NA NA NA NA
6 57 31 NA 0 31 15 6
7 58 43 NA 0 31 31 23
8 81 NA 47 0 0 31 23
9 85 31 47 18 31 61 NA
10 95 NA 24 18 0 61 12
11 101 NA 47 0 31 46 17
12 103 NA 24 18 31 31 17
13 107 NA 47 0 31 31 12
14 108 NA 24 18 31 31 12
15 109 35 NA 0 31 46 23
16 125 43 NA 0 31 31 23
17 127 73 NA 0 31 61 23
18 156 NA 0 0 0 31 6
19 182 31 NA 18 31 15 17
20 194 23 NA 0 0 15 6
# ℹ 16 more rows
# Calculate the percentage of missing values per column
missing_percentage <- colMeans(is.na(d)) * 100
# Get columns with more than 10% missing data
columns_to_remove <- names(missing_percentage[missing_percentage > 10])
length(columns_to_remove)
[1] 11
dim(d)
[1] 385 36
# Print the selected columns
cat("Columns with more than 10% missing values:\n")
Columns with more than 10% missing values:
print(columns_to_remove)
[1] "D_LOC" "D_LOC_CODE" "D_LOC_INFO" "HLTH_CPAIN_CAT"
[5] "HLTH_CDIS_CAT" "D_HH_SIZE_INFO" "D_EMP_DAYS" "W_HH_SIZE"
[9] "W_STORAGE" "D_INFO" "SES_INFO"
d <- d %>% select(-all_of(columns_to_remove))
dim(d)
[1] 385 25
head(d)
# A tibble: 6 × 25
ID D_YRBR D_AGE D_LOC_TIME HLTH_SMK HLTH_TTM HLTH_MED HLTH_CPAIN HLTH_CDIS
<int> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1987 35 35 "No" No "No" No "Tuve co…
2 2 1990 32 32 "Yes" No "Lorata… No "No"
3 3 1992 30 8 "No" No "No" No "No"
4 4 1982 40 32 "No" No "No" No "No"
5 5 1976 46 45 "No" No "Yes,om… Yes "Dolor c…
6 6 1990 32 8 "a veces… no "no" No "no"
# ℹ 16 more variables: D_HH_SIZE <chr>, D_CHLD <chr>, D_EMP_TYPE <chr>,
# SES_EDU_CAT <chr>, SES_EDU_SC <dbl>, SES_BTHR_CAT <chr>, SES_BTHR_SC <dbl>,
# SES_CAR_CAT <chr>, SES_CAR_SC <dbl>, SES_INT_CAT <chr>, SES_INT_SC <dbl>,
# SES_WRK_CAT <chr>, SES_WRK_SC <dbl>, SES_BEDR_CAT <chr>, SES_BEDR_SC <dbl>,
# SES_SC_Total <dbl>
# Function to convert non-numeric values to "other"
convert_to_other <- function(x) {
ifelse(grepl("^[0-9]+$", x) | x == "other" | is.na(x), x, "other")
}
# Apply transformation from yes and no to 1 and 0, to some columns
selected_columns <- c("D_CHLD", "D_HH_SIZE")
d <- d %>%
mutate(across(all_of(selected_columns), ~ convert_to_other(as.character(.))))
# Specify the columns where transformation should be applied
columns_to_transform <- c("HLTH_SMK", "HLTH_TTM", "HLTH_CPAIN") # Replace with actual column names
# Convert "yes" to 1, "no" to 0, and everything else to "other"
d <- d %>%
mutate(across(all_of(columns_to_transform), ~ case_when(
. == "Yes" ~ "1",
. == "No" ~ "0",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
# Convert numeric-coded columns to factors where applicable
d <- d %>%
mutate(across(where(~ all(. %in% c("0", "1", "other", NA_character_))), as.factor)) # Ensure consistency
# Convert variables to numeric when appropriate
## Pending: D_YRBR, D_LOC_TIME, D_CHLD, D_HH_SIZE
print(d)
# A tibble: 385 × 25
ID D_YRBR D_AGE D_LOC_TIME HLTH_SMK HLTH_TTM HLTH_MED HLTH_CPAIN HLTH_CDIS
<int> <chr> <int> <chr> <fct> <fct> <chr> <fct> <chr>
1 1 1987 35 35 0 0 "No" 0 "Tuve co…
2 2 1990 32 32 1 0 "Lorata… 0 "No"
3 3 1992 30 8 0 0 "No" 0 "No"
4 4 1982 40 32 0 0 "No" 0 "No"
5 5 1976 46 45 0 0 "Yes,om… 1 "Dolor c…
6 6 1990 32 8 other other "no" 0 "no"
7 7 1997 25 <NA> other other "no" 1 "dolor c…
8 8 1985 37 10 0 other "no" 0 "No diab…
9 9 1982 40 40 0 0 "No" other "No"
10 10 1979 49 43 0 0 "No" 0 "No"
# ℹ 375 more rows
# ℹ 16 more variables: D_HH_SIZE <chr>, D_CHLD <chr>, D_EMP_TYPE <chr>,
# SES_EDU_CAT <chr>, SES_EDU_SC <dbl>, SES_BTHR_CAT <chr>, SES_BTHR_SC <dbl>,
# SES_CAR_CAT <chr>, SES_CAR_SC <dbl>, SES_INT_CAT <chr>, SES_INT_SC <dbl>,
# SES_WRK_CAT <chr>, SES_WRK_SC <dbl>, SES_BEDR_CAT <chr>, SES_BEDR_SC <dbl>,
# SES_SC_Total <dbl>
summary(d)
ID D_YRBR D_AGE D_LOC_TIME HLTH_SMK
Min. : 1 Length:385 Min. :18.0 Length:385 0 :268
1st Qu.: 98 Class :character 1st Qu.:26.0 Class :character 1 : 55
Median :195 Mode :character Median :32.0 Mode :character other: 60
Mean :245 Mean :32.1 NA's : 2
3rd Qu.:401 3rd Qu.:38.0
Max. :497 Max. :49.0
NA's :2
HLTH_TTM HLTH_MED HLTH_CPAIN HLTH_CDIS
0 :241 Length:385 0 :281 Length:385
other:144 Class :character 1 : 51 Class :character
Mode :character other: 53 Mode :character
D_HH_SIZE D_CHLD D_EMP_TYPE SES_EDU_CAT
Length:385 Length:385 Length:385 Length:385
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
SES_EDU_SC SES_BTHR_CAT SES_BTHR_SC SES_CAR_CAT
Min. : 10.00 Length:385 Min. : 0.00 Length:385
1st Qu.: 31.00 Class :character 1st Qu.: 0.00 Class :character
Median : 31.00 Mode :character Median :24.00 Mode :character
Mean : 36.34 Mean :22.28
3rd Qu.: 43.00 3rd Qu.:24.00
Max. :101.00 Max. :47.00
NA's :15 NA's :17
SES_CAR_SC SES_INT_CAT SES_INT_SC SES_WRK_CAT
Min. : 0.000 Length:385 Min. : 0.00 Length:385
1st Qu.: 0.000 Class :character 1st Qu.:31.00 Class :character
Median : 0.000 Mode :character Median :31.00 Mode :character
Mean : 6.003 Mean :24.12
3rd Qu.:18.000 3rd Qu.:31.00
Max. :37.000 Max. :31.00
NA's :5 NA's :2
SES_WRK_SC SES_BEDR_CAT SES_BEDR_SC SES_SC_Total
Min. : 0.00 Length:385 Min. : 0.00 Min. : 25.0
1st Qu.:15.00 Class :character 1st Qu.:12.00 1st Qu.:104.0
Median :31.00 Mode :character Median :12.00 Median :129.0
Mean :31.37 Mean :13.27 Mean :133.1
3rd Qu.:46.00 3rd Qu.:17.00 3rd Qu.:159.0
Max. :61.00 Max. :23.00 Max. :263.0
NA's :2 NA's :4 NA's :36
#Keep columns relevant/cleaned
columns_to_keep2 <- c("ID", "D_YRBR", "D_LOC_TIME", "D_AGE", "D_HH_SIZE", "D_CHLD", "HLTH_SMK", "SES_SC_Total")
d <- d %>% select(all_of(columns_to_keep2))
write.csv(d, file.path(data_path, "00.SCREENING_V2.csv"))
file2 <- "02.HWISE_PSS.csv"
file3 <- "Chronic_pain_illness.csv"
df1 <- d
df2 <- read.csv(file.path(data_path, file2), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df3 <- read.csv(file.path(data_path, file3), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
# Identify the common column for merging
common_column <- "ID" # Change this if needed
# Ensure the common column is of the same type in all datasets
df1[[common_column]] <- as.character(df1[[common_column]])
df2[[common_column]] <- as.character(df2[[common_column]])
df3[[common_column]] <- as.character(df3[[common_column]])
# Merge datasets sequentially by the common column
merged_df <- df1 %>%
full_join(df2, by = common_column) %>%
full_join(df3, by = common_column)
colnames(merged_df)
[1] "ID" "D_YRBR" "D_LOC_TIME" "D_AGE"
[5] "D_HH_SIZE" "D_CHLD" "HLTH_SMK" "SES_SC_Total"
[9] "SEASON" "W_WS_LOC" "HW_TOTAL" "HW_WORRY"
[13] "HW_INTERR" "HW_CLOTHES" "HW_PLANS" "HW_FOOD"
[17] "HW_HANDS" "HW_BODY" "HW_DRINK" "HW_ANGRY"
[21] "HW_SLEEP" "HW_NONE" "HW_SHAME" "HW_INFO"
[25] "PSS1" "PSS2" "PSS3" "PSS4"
[29] "PSS5" "PSS6" "PSS7" "PSS8"
[33] "PSS9" "PSS10" "PSS11" "PSS12"
[37] "PSS13" "PSS14" "PSS_TOTAL" "PSS_INFO"
[41] "HLTH_CPAIN_CAT" "HLTH_CDIS_CAT"
cols_to_remove<- c("HW_INFO", "PSS_INFO", "PSS_TOTAL", "HW_TOTAL")
merged_df <- merged_df %>% select(-all_of(cols_to_remove))
##### Calculate final HW score
# Identify columns that start with "HW_"
hw_columns <- grep("^HW_", names(merged_df), value = TRUE)
# Sum up all HW_ columns row-wise, keeping NA if any HW_ column has NA
merged_df <- merged_df %>%
rowwise() %>%
mutate(HW_TOTAL = if_else(any(is.na(c_across(all_of(hw_columns)))),
NA_real_,
sum(c_across(all_of(hw_columns)),
na.rm = TRUE))) %>%
ungroup()
# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")
Merged dataset dimensions: 400 39
print(merged_df, n=20)
# A tibble: 400 × 39
ID D_YRBR D_LOC_TIME D_AGE D_HH_SIZE D_CHLD HLTH_SMK SES_SC_Total SEASON
<chr> <chr> <chr> <int> <chr> <chr> <fct> <dbl> <chr>
1 1 1987 35 35 4 0 0 149 Oct-Dec
2 2 1990 32 32 12 2 1 196 Oct-Dec
3 3 1992 8 30 7 2 0 52 Oct-Dec
4 4 1982 32 40 4 1 0 214 Oct-Dec
5 5 1976 45 46 4 other 0 117 Oct-Dec
6 6 1990 8 32 6 1 other 220 Oct-Dec
7 7 1997 <NA> 25 4 2 other 130 Oct-Dec
8 8 1985 10 37 4 2 0 NA Oct-Dec
9 9 1982 40 40 6 4 0 71 Oct-Dec
10 10 1979 43 49 3 1 0 117 Oct-Dec
11 11 1992 21 30 10 2 0 NA Oct-Dec
12 12 1996 26 26 17 2 0 NA Oct-Dec
13 13 1998 24 24 6 2 0 159 Oct-Dec
14 14 2000 22 22 6 0 0 132 Oct-Dec
15 15 1997 13 25 5 1 1 113 Oct-Dec
16 16 1990 14 32 5 other 0 104 Oct-Dec
17 17 1988 34 34 6 3 0 127 Oct-Dec
18 18 1978 <NA> 44 <NA> <NA> <NA> NA Oct-Dec
19 19 1991 10 31 4 2 0 110 Oct-Dec
20 20 1992 11 30 10 2 0 199 Oct-Dec
# ℹ 380 more rows
# ℹ 30 more variables: W_WS_LOC <chr>, HW_WORRY <int>, HW_INTERR <int>,
# HW_CLOTHES <int>, HW_PLANS <int>, HW_FOOD <int>, HW_HANDS <int>,
# HW_BODY <int>, HW_DRINK <int>, HW_ANGRY <int>, HW_SLEEP <int>,
# HW_NONE <int>, HW_SHAME <int>, PSS1 <int>, PSS2 <int>, PSS3 <int>,
# PSS4 <int>, PSS5 <int>, PSS6 <int>, PSS7 <int>, PSS8 <int>, PSS9 <int>,
# PSS10 <int>, PSS11 <int>, PSS12 <int>, PSS13 <int>, PSS14 <int>, …
# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS.csv'.\n")
Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS.csv'.
# merged dataset
df1 = read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS.csv"), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 = read.csv(file.path(data_path, "hours_water_supply.csv"), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
# Merge datasets sequentially by the common column
merged_df <- df1 %>%
full_join(df2, by = "ID")
seas <- c("SEASON")
merged_df <- merged_df %>%
mutate(across(all_of(seas), ~ case_when(
. == "Oct-Dec" ~ "1",
. == "Apr-Jun" ~ "0",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
#Specify the columns where transformation should be applied
columns_to_transform <- c("W_WC_WI")
# Convert "intermitent" to 1, "continuous" to 0, and everything else to "other"
merged_df <- merged_df %>%
mutate(across(all_of(columns_to_transform), ~ case_when(
. == "Intermitente" ~ "1",
. == "Continua" ~ "0",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
columns_to_transform <- c("W_WS_LOC")
# Convert "WI" to 1, "WS" to 0, and everything else to "other"
merged_df <- merged_df %>%
mutate(across(all_of(columns_to_transform), ~ case_when(
. == "WI" ~ "1",
. == "WS" ~ "0",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
# Define the list of variables to convert to numeric
col_to_num <- c("D_YRBR", "D_LOC_TIME", "D_HH_SIZE", "D_CHLD", "HRS_WEEK", "HLTH_SMK", "W_WS_LOC", "SEASON", "HRS_WEEK", "W_WC_WI")
# Convert the specified columns to numeric
merged_df <- merged_df %>%
mutate(across(all_of(col_to_num), ~as.numeric(as.character(.)), .names = "{.col}"))
Warning: There were 7 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(all_of(col_to_num), ~as.numeric(as.character(.)), .names
= "{.col}")`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 6 remaining warnings.
# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")
Merged dataset dimensions: 402 41
head(merged_df, 20)
ID D_YRBR D_LOC_TIME D_AGE D_HH_SIZE D_CHLD HLTH_SMK SES_SC_Total SEASON
1 1 1987 35 35 4 0 0 149 1
2 2 1990 32 32 12 2 1 196 1
3 3 1992 8 30 7 2 0 52 1
4 4 1982 32 40 4 1 0 214 1
5 5 1976 45 46 4 NA 0 117 1
6 6 1990 8 32 6 1 NA 220 1
7 7 1997 NA 25 4 2 NA 130 1
8 8 1985 10 37 4 2 0 NA 1
9 9 1982 40 40 6 4 0 71 1
10 10 1979 43 49 3 1 0 117 1
11 11 1992 21 30 10 2 0 NA 1
12 12 1996 26 26 17 2 0 NA 1
13 13 1998 24 24 6 2 0 159 1
14 14 2000 22 22 6 0 0 132 1
15 15 1997 13 25 5 1 1 113 1
16 16 1990 14 32 5 NA 0 104 1
17 17 1988 34 34 6 3 0 127 1
18 18 1978 NA 44 NA NA NA NA 1
19 19 1991 10 31 4 2 0 110 1
20 20 1992 11 30 10 2 0 199 1
W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES HW_PLANS HW_FOOD HW_HANDS HW_BODY
1 1 2 0 0 0 0 0 0
2 1 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0
5 1 2 1 1 1 1 0 1
6 1 1 1 2 1 3 1 1
7 1 0 1 1 1 1 1 1
8 1 2 2 2 0 0 2 0
9 1 0 2 0 0 0 0 0
10 1 2 2 2 0 0 0 0
11 1 1 0 0 1 1 0 2
12 1 1 1 0 0 0 0 0
13 1 1 1 0 1 0 0 0
14 1 1 2 0 0 0 0 0
15 1 0 0 1 0 0 0 0
16 1 3 2 2 0 1 0 3
17 1 1 1 2 1 0 0 1
18 1 0 1 1 0 0 0 0
19 1 1 1 2 1 0 0 1
20 1 2 2 1 2 1 0 2
HW_DRINK HW_ANGRY HW_SLEEP HW_NONE HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6
1 0 1 0 1 0 2 3 2 4 3 2
2 0 2 0 2 0 2 2 3 3 3 4
3 0 1 0 0 0 2 2 2 2 2 3
4 0 1 0 0 1 2 3 2 3 3 3
5 0 1 0 3 1 2 2 3 2 2 2
6 0 1 0 2 2 2 2 3 2 3 3
7 0 1 0 1 1 2 2 2 2 3 2
8 0 0 0 0 0 2 1 3 3 3 2
9 0 1 0 0 0 2 3 2 3 2 2
10 0 0 0 0 0 1 2 1 3 3 3
11 0 3 0 3 0 3 3 4 4 2 1
12 0 1 0 1 1 3 4 4 1 2 2
13 0 0 1 1 0 4 3 4 4 2 3
14 0 0 0 0 0 2 2 2 3 3 3
15 0 0 0 0 0 3 3 3 2 2 2
16 0 3 0 3 2 3 3 2 2 1 3
17 0 0 0 0 0 2 0 2 3 3 3
18 1 2 1 1 0 2 1 2 3 2 2
19 0 0 0 2 0 1 3 3 3 2 2
20 1 2 0 0 1 1 2 4 2 1 3
PSS7 PSS8 PSS9 PSS10 PSS11 PSS12 PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT
1 2 1 2 3 1 4 2 1 0 1
2 3 2 3 2 2 4 3 2 0 0
3 3 2 3 2 2 2 3 2 0 0
4 3 1 3 2 2 2 3 1 0 0
5 3 2 2 1 3 3 2 3 1 0
6 4 1 3 3 2 3 1 2 0 0
7 2 2 2 2 2 2 2 2 1 0
8 3 1 3 3 4 2 2 2 0 1
9 2 3 3 2 2 2 2 2 1 0
10 3 2 3 0 2 3 3 2 0 0
11 2 3 3 1 4 3 4 1 1 0
12 2 2 2 1 4 3 2 4 0 0
13 4 2 1 1 3 3 2 3 0 0
14 3 2 3 3 2 2 3 1 0 0
15 1 2 1 1 3 3 2 3 0 0
16 3 1 2 2 2 3 2 4 0 0
17 3 2 3 2 2 3 3 3 0 0
18 2 3 2 2 3 3 1 2 1 0
19 2 3 2 3 3 3 3 2 1 0
20 3 2 3 2 3 3 2 2 0 0
HW_TOTAL W_WC_WI HRS_WEEK
1 4 1 168.00
2 5 0 35.00
3 1 1 NA
4 2 1 28.00
5 12 1 156.00
6 15 1 12.00
7 9 1 11.00
8 8 1 39.00
9 3 1 5.25
10 6 1 24.50
11 11 1 20.00
12 5 1 77.00
13 5 1 21.00
14 3 1 21.00
15 1 1 84.00
16 19 1 35.00
17 6 1 9.00
18 7 1 17.50
19 8 1 8.00
20 14 1 42.00
summary(merged_df)
ID D_YRBR D_LOC_TIME D_AGE
Min. : 1.0 Min. :1976 Min. : 2.00 Min. :18.00
1st Qu.:101.2 1st Qu.:1984 1st Qu.:13.00 1st Qu.:26.00
Median :201.5 Median :1990 Median :23.00 Median :32.00
Mean :248.3 Mean :1990 Mean :22.46 Mean :32.11
3rd Qu.:396.8 3rd Qu.:1996 3rd Qu.:31.00 3rd Qu.:38.00
Max. :497.0 Max. :2005 Max. :46.00 Max. :49.00
NA's :17 NA's :36 NA's :18
D_HH_SIZE D_CHLD HLTH_SMK SES_SC_Total
Min. : 2.000 Min. :0.000 Min. :0.0000 Min. : 25.0
1st Qu.: 4.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:104.2
Median : 5.000 Median :2.000 Median :0.0000 Median :129.5
Mean : 5.578 Mean :1.857 Mean :0.1698 Mean :133.2
3rd Qu.: 6.000 3rd Qu.:3.000 3rd Qu.:0.0000 3rd Qu.:159.0
Max. :40.000 Max. :8.000 Max. :1.0000 Max. :263.0
NA's :23 NA's :24 NA's :78 NA's :52
SEASON W_WS_LOC HW_WORRY HW_INTERR
Min. :0.0000 Min. :0.0000 Min. :0.00 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:1.000
Median :1.0000 Median :1.0000 Median :1.00 Median :1.000
Mean :0.5088 Mean :0.5063 Mean :1.16 Mean :1.316
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.00 3rd Qu.:2.000
Max. :1.0000 Max. :1.0000 Max. :3.00 Max. :3.000
NA's :3 NA's :3 NA's :3 NA's :3
HW_CLOTHES HW_PLANS HW_FOOD HW_HANDS
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :1.0000 Median :0.0000 Median :0.0000
Mean :0.9874 Mean :0.8571 Mean :0.4422 Mean :0.2682
3rd Qu.:2.0000 3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :3.0000 Max. :3.0000 Max. :3.0000 Max. :3.0000
NA's :4 NA's :3 NA's :4 NA's :3
HW_BODY HW_DRINK HW_ANGRY HW_SLEEP
Min. :0.0000 Min. :0.0000 Min. :0.00 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :1.00 Median :0.0000
Mean :0.6792 Mean :0.4261 Mean :1.04 Mean :0.2638
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.00 3rd Qu.:0.0000
Max. :3.0000 Max. :3.0000 Max. :3.00 Max. :2.0000
NA's :3 NA's :3 NA's :4 NA's :4
HW_NONE HW_SHAME PSS1 PSS2
Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:2.000
Median :0.0000 Median :0.0000 Median :2.000 Median :2.000
Mean :0.7513 Mean :0.2197 Mean :2.111 Mean :2.219
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.:3.000
Max. :3.0000 Max. :3.0000 Max. :4.000 Max. :4.000
NA's :4 NA's :6 NA's :4 NA's :4
PSS3 PSS4 PSS5 PSS6 PSS7
Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000
1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:2.000
Median :3.00 Median :3.000 Median :3.000 Median :3.00 Median :3.000
Mean :2.93 Mean :2.573 Mean :2.664 Mean :2.83 Mean :2.578
3rd Qu.:4.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:3.000
Max. :4.00 Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.000
NA's :4 NA's :6 NA's :3 NA's :3 NA's :4
PSS8 PSS9 PSS10 PSS11
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000
Median :2.000 Median :3.000 Median :2.000 Median :3.000
Mean :2.133 Mean :2.653 Mean :1.992 Mean :2.565
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
NA's :4 NA's :4 NA's :4 NA's :4
PSS12 PSS13 PSS14 HLTH_CPAIN_CAT
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.0000
Median :3.000 Median :3.000 Median :2.000 Median :0.0000
Mean :2.957 Mean :2.547 Mean :2.168 Mean :0.2663
3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:1.0000
Max. :4.000 Max. :4.000 Max. :4.000 Max. :1.0000
NA's :4 NA's :5 NA's :4 NA's :4
HLTH_CDIS_CAT HW_TOTAL W_WC_WI HRS_WEEK
Min. :0.0000 Min. : 0.000 Min. :0.0000 Min. : 0.00
1st Qu.:0.0000 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.: 21.00
Median :0.0000 Median : 8.000 Median :1.0000 Median : 42.00
Mean :0.1646 Mean : 8.419 Mean :0.7158 Mean : 85.13
3rd Qu.:0.0000 3rd Qu.:12.000 3rd Qu.:1.0000 3rd Qu.:168.00
Max. :1.0000 Max. :27.000 Max. :1.0000 Max. :168.00
NA's :1 NA's :11 NA's :22 NA's :39
dim(merged_df)
[1] 402 41
# Save the merged dataset
write.csv(merged_df, file.path( "Cleaned_Dataset_Screening_HWISE_PSS_V2.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V2.csv'.\n")
Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V2.csv'.
# Load the dataset
file4 <- "Q8-26-28.csv"
df <- read.csv(file.path(data_path, file4), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 <- read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V2.csv"))
# Convert all character columns to NA where values match missing patterns
df <- df %>%
select(ID, MX8_TRUST, MX28_WQ_COMP, MX26_EM_HHW_TYPE) %>%
mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))
#remove rows with NAs
df <- na.omit(df)
df <- unique(df)
# Merge datasets sequentially by the common column
merged_df <- df %>%
full_join(df2, by = "ID")
# Convert
merged_df <- merged_df %>%
mutate(across(all_of("MX8_TRUST"), ~ case_when(
. == "Yes" ~ "0",
. == "Neutral" ~ "1",
. == "No" ~ "2",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
# Convert
merged_df <- merged_df %>%
mutate(across(all_of("MX28_WQ_COMP"), ~ case_when(
. == "Worse" ~ "0",
. == "Same" ~ "1",
. == "Better" ~ "2",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
# Convert
merged_df <- merged_df %>%
mutate(across(all_of("MX26_EM_HHW_TYPE"), ~ case_when(
. == "positive" ~ "0",
. == "negative" ~ "1",
!is.na(.) ~ "other", # Assign "other" to all non-missing values that are not "yes" or "no"
TRUE ~ NA_character_
)))
merged_df$MX8_TRUST <- as.factor(merged_df$MX8_TRUST)
merged_df$MX28_WQ_COMP <- as.factor(merged_df$MX28_WQ_COMP)
merged_df$MX26_EM_HHW_TYPE <- as.factor(merged_df$MX26_EM_HHW_TYPE)
merged_df$PSS_TOTAL = rowSums(merged_df[(25 + c(1,2,3,8,11,12,14))]) - rowSums(merged_df[(25 + c(4,5,6,7,9,10,13))])
# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")
Merged dataset dimensions: 402 45
head(merged_df, 20)
ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE D_YRBR D_LOC_TIME D_AGE D_HH_SIZE
1 1 2 0 0 1987 35 35 4
2 2 2 2 0 1990 32 32 12
3 3 0 1 0 1992 8 30 7
4 4 2 0 1 1982 32 40 4
5 5 2 2 1 1976 45 46 4
6 6 2 0 1 1990 8 32 6
7 7 2 1 1 1997 NA 25 4
8 8 2 1 1 1985 10 37 4
9 9 2 1 1 1982 40 40 6
10 10 2 0 0 1979 43 49 3
11 11 2 0 1 1992 21 30 10
12 12 2 1 0 1996 26 26 17
13 13 2 1 1 1998 24 24 6
14 15 2 2 0 1997 13 25 5
15 16 2 1 1 1990 14 32 5
16 17 2 0 1 1988 34 34 6
17 18 2 0 1 1978 NA 44 NA
18 19 2 0 1 1991 10 31 4
19 20 2 0 1 1992 11 30 10
20 21 2 1 0 1985 34 37 4
D_CHLD HLTH_SMK SES_SC_Total SEASON W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES
1 0 0 149 1 1 2 0 0
2 2 1 196 1 1 0 0 0
3 2 0 52 1 1 0 0 0
4 1 0 214 1 1 0 0 0
5 NA 0 117 1 1 2 1 1
6 1 NA 220 1 1 1 1 2
7 2 NA 130 1 1 0 1 1
8 2 0 NA 1 1 2 2 2
9 4 0 71 1 1 0 2 0
10 1 0 117 1 1 2 2 2
11 2 0 NA 1 1 1 0 0
12 2 0 NA 1 1 1 1 0
13 2 0 159 1 1 1 1 0
14 1 1 113 1 1 0 0 1
15 NA 0 104 1 1 3 2 2
16 3 0 127 1 1 1 1 2
17 NA NA NA 1 1 0 1 1
18 2 0 110 1 1 1 1 2
19 2 0 199 1 1 2 2 1
20 2 0 86 1 1 0 0 0
HW_PLANS HW_FOOD HW_HANDS HW_BODY HW_DRINK HW_ANGRY HW_SLEEP HW_NONE
1 0 0 0 0 0 1 0 1
2 0 0 0 1 0 2 0 2
3 0 0 0 0 0 1 0 0
4 0 0 0 0 0 1 0 0
5 1 1 0 1 0 1 0 3
6 1 3 1 1 0 1 0 2
7 1 1 1 1 0 1 0 1
8 0 0 2 0 0 0 0 0
9 0 0 0 0 0 1 0 0
10 0 0 0 0 0 0 0 0
11 1 1 0 2 0 3 0 3
12 0 0 0 0 0 1 0 1
13 1 0 0 0 0 0 1 1
14 0 0 0 0 0 0 0 0
15 0 1 0 3 0 3 0 3
16 1 0 0 1 0 0 0 0
17 0 0 0 0 1 2 1 1
18 1 0 0 1 0 0 0 2
19 2 1 0 2 1 2 0 0
20 0 0 0 0 0 1 0 0
HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6 PSS7 PSS8 PSS9 PSS10 PSS11 PSS12
1 0 2 3 2 4 3 2 2 1 2 3 1 4
2 0 2 2 3 3 3 4 3 2 3 2 2 4
3 0 2 2 2 2 2 3 3 2 3 2 2 2
4 1 2 3 2 3 3 3 3 1 3 2 2 2
5 1 2 2 3 2 2 2 3 2 2 1 3 3
6 2 2 2 3 2 3 3 4 1 3 3 2 3
7 1 2 2 2 2 3 2 2 2 2 2 2 2
8 0 2 1 3 3 3 2 3 1 3 3 4 2
9 0 2 3 2 3 2 2 2 3 3 2 2 2
10 0 1 2 1 3 3 3 3 2 3 0 2 3
11 0 3 3 4 4 2 1 2 3 3 1 4 3
12 1 3 4 4 1 2 2 2 2 2 1 4 3
13 0 4 3 4 4 2 3 4 2 1 1 3 3
14 0 3 3 3 2 2 2 1 2 1 1 3 3
15 2 3 3 2 2 1 3 3 1 2 2 2 3
16 0 2 0 2 3 3 3 3 2 3 2 2 3
17 0 2 1 2 3 2 2 2 3 2 2 3 3
18 0 1 3 3 3 2 2 2 3 2 3 3 3
19 1 1 2 4 2 1 3 3 2 3 2 3 3
20 0 2 2 2 3 2 4 3 2 3 2 2 2
PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT HW_TOTAL W_WC_WI HRS_WEEK PSS_TOTAL
1 2 1 0 1 4 1 168.00 -4
2 3 2 0 0 5 0 35.00 -4
3 3 2 0 0 1 1 NA -4
4 3 1 0 0 2 1 28.00 -7
5 2 3 1 0 12 1 156.00 4
6 1 2 0 0 15 1 12.00 -4
7 2 2 1 0 9 1 11.00 -1
8 2 2 0 1 8 1 39.00 -4
9 2 2 1 0 3 1 5.25 0
10 3 2 0 0 6 1 24.50 -5
11 4 1 1 0 11 1 20.00 4
12 2 4 0 0 5 1 77.00 12
13 2 3 0 0 5 1 21.00 5
14 2 3 0 0 1 1 84.00 9
15 2 4 0 0 19 1 35.00 3
16 3 3 0 0 6 1 9.00 -6
17 1 2 1 0 7 1 17.50 2
18 3 2 1 0 8 1 8.00 1
19 2 2 0 0 14 1 42.00 1
20 2 2 0 0 1 1 21.00 -5
summary(merged_df)
ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE D_YRBR
Min. : 1.0 0 : 72 0 :144 0 :130 Min. :1976
1st Qu.:101.2 1 : 5 1 :157 1 :260 1st Qu.:1984
Median :201.5 2 :313 2 : 92 other: 3 Median :1990
Mean :248.3 other: 3 NA's: 9 NA's : 9 Mean :1990
3rd Qu.:396.8 NA's : 9 3rd Qu.:1996
Max. :497.0 Max. :2005
NA's :17
D_LOC_TIME D_AGE D_HH_SIZE D_CHLD
Min. : 2.00 Min. :18.00 Min. : 2.000 Min. :0.000
1st Qu.:13.00 1st Qu.:26.00 1st Qu.: 4.000 1st Qu.:1.000
Median :23.00 Median :32.00 Median : 5.000 Median :2.000
Mean :22.46 Mean :32.11 Mean : 5.578 Mean :1.857
3rd Qu.:31.00 3rd Qu.:38.00 3rd Qu.: 6.000 3rd Qu.:3.000
Max. :46.00 Max. :49.00 Max. :40.000 Max. :8.000
NA's :36 NA's :18 NA's :23 NA's :24
HLTH_SMK SES_SC_Total SEASON W_WS_LOC
Min. :0.0000 Min. : 25.0 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:104.2 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :129.5 Median :1.0000 Median :1.0000
Mean :0.1698 Mean :133.2 Mean :0.5088 Mean :0.5063
3rd Qu.:0.0000 3rd Qu.:159.0 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :263.0 Max. :1.0000 Max. :1.0000
NA's :78 NA's :52 NA's :3 NA's :3
HW_WORRY HW_INTERR HW_CLOTHES HW_PLANS
Min. :0.00 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.00 Median :1.000 Median :1.0000 Median :1.0000
Mean :1.16 Mean :1.316 Mean :0.9874 Mean :0.8571
3rd Qu.:2.00 3rd Qu.:2.000 3rd Qu.:2.0000 3rd Qu.:2.0000
Max. :3.00 Max. :3.000 Max. :3.0000 Max. :3.0000
NA's :3 NA's :3 NA's :4 NA's :3
HW_FOOD HW_HANDS HW_BODY HW_DRINK
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.4422 Mean :0.2682 Mean :0.6792 Mean :0.4261
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :3.0000 Max. :3.0000 Max. :3.0000 Max. :3.0000
NA's :4 NA's :3 NA's :3 NA's :3
HW_ANGRY HW_SLEEP HW_NONE HW_SHAME
Min. :0.00 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.00 Median :0.0000 Median :0.0000 Median :0.0000
Mean :1.04 Mean :0.2638 Mean :0.7513 Mean :0.2197
3rd Qu.:2.00 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :3.00 Max. :2.0000 Max. :3.0000 Max. :3.0000
NA's :4 NA's :4 NA's :4 NA's :6
PSS1 PSS2 PSS3 PSS4 PSS5
Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :3.00 Median :3.000 Median :3.000
Mean :2.111 Mean :2.219 Mean :2.93 Mean :2.573 Mean :2.664
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.000 Max. :4.000
NA's :4 NA's :4 NA's :4 NA's :6 NA's :3
PSS6 PSS7 PSS8 PSS9 PSS10
Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
Median :3.00 Median :3.000 Median :2.000 Median :3.000 Median :2.000
Mean :2.83 Mean :2.578 Mean :2.133 Mean :2.653 Mean :1.992
3rd Qu.:4.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.00 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
NA's :3 NA's :4 NA's :4 NA's :4 NA's :4
PSS11 PSS12 PSS13 PSS14
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:1.000
Median :3.000 Median :3.000 Median :3.000 Median :2.000
Mean :2.565 Mean :2.957 Mean :2.547 Mean :2.168
3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
NA's :4 NA's :4 NA's :5 NA's :4
HLTH_CPAIN_CAT HLTH_CDIS_CAT HW_TOTAL W_WC_WI
Min. :0.0000 Min. :0.0000 Min. : 0.000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 3.000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median : 8.000 Median :1.0000
Mean :0.2663 Mean :0.1646 Mean : 8.419 Mean :0.7158
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:12.000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :27.000 Max. :1.0000
NA's :4 NA's :1 NA's :11 NA's :22
HRS_WEEK PSS_TOTAL
Min. : 0.00 Min. :-19.0000
1st Qu.: 21.00 1st Qu.: -6.0000
Median : 42.00 Median : 0.0000
Mean : 85.13 Mean : -0.7215
3rd Qu.:168.00 3rd Qu.: 4.0000
Max. :168.00 Max. : 19.0000
NA's :39 NA's :7
dim(merged_df)
[1] 402 45
# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V3.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V3.csv'.\n")
Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V3.csv'.
# Load the dataset
file5 <- "Q9-10-11-29-31.csv"
df <- read.csv(file.path(data_path, file5), stringsAsFactors = FALSE, na.strings = c("", "N/A", "NA", "pending"))
df2 <- read.csv(file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V3.csv"))
summary(df)
ID MX9_DRINK_W MX9_DRINK_BOTPUR MX9_DRINK_BOT
Min. : 1.00 Length:396 Length:396 Length:396
1st Qu.: 99.75 Class :character Class :character Class :character
Median :198.50 Mode :character Mode :character Mode :character
Mean :245.05
3rd Qu.:393.25
Max. :492.00
MX10_WSTORAGE MX11_WCOST MX11_WCOST_2MNTHS MX11_WCOST_2MNTHS_NUM
Length:396 Length:396 Length:396 Length:396
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
MX11_WCOST_NUL MX29_IDL_WS MX29_IDL_WS_CAT MX29_IDL_WS_LKLY
Length:396 Length:396 Length:396 Length:396
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
MX29_IDL_WS_INFO MX31_WCOST_BOTL
Length:396 Length:396
Class :character Class :character
Mode :character Mode :character
head(df)
ID MX9_DRINK_W
1 1 <NA>
2 2 a) agua del suministro publico,E) Garrafon de purificadora
3 3 E) Garrafon de purificadora
4 4 E) Garrafon de purificadora
5 5 E) Garrafon de purificadora
6 6 E) Garrafon de purificadora,G) Botellas de Agua
MX9_DRINK_BOTPUR MX9_DRINK_BOT
1 <NA> <NA>
2 <NA> <NA>
3 <NA> <NA>
4 <NA> <NA>
5 <NA> <NA>
6 <NA> <NA>
MX10_WSTORAGE MX11_WCOST
1 <NA> <NA>
2 B) Tinaco,C) Cisterna,D) Tambo <NA>
3 B) Tinaco,C) Cisterna <NA>
4 B) Tinaco,C) Cisterna <NA>
5 C) Cisterna,D) Tambo <NA>
6 B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa <NA>
MX11_WCOST_2MNTHS MX11_WCOST_2MNTHS_NUM MX11_WCOST_NUL MX29_IDL_WS
1 <NA> <NA> <NA> <NA>
2 0 0 yes <NA>
3 60 60 no <NA>
4 0 0 yes <NA>
5 DK DK no <NA>
6 0 0 yes <NA>
MX29_IDL_WS_CAT MX29_IDL_WS_LKLY MX29_IDL_WS_INFO MX31_WCOST_BOTL
1 <NA> <NA> <NA> <NA>
2 quality n/a <NA> doubled
3 quantity no <NA> 7--> 15
4 quantity no <NA> no change
5 quality n/a <NA> 8--> 15
6 quality no <NA> 11--> 18
# Convert all character columns to NA where values match missing patterns
df <- df %>%
select(ID, MX9_DRINK_W, MX10_WSTORAGE) %>%
mutate(across(where(is.character), ~ ifelse(. %in% c("", "N/A", "NA", "pending"), NA, .)))
#remove rows with NAs
df <- na.omit(df)
df <- unique(df)
dim(df)
[1] 379 3
# Merge datasets sequentially by the common column
merged_df <- df2 %>%
full_join(df, by = "ID")
# Print merged dataset dimensions
cat("Merged dataset dimensions:", dim(merged_df), "\n")
Merged dataset dimensions: 402 47
head(merged_df, 20)
ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE D_YRBR D_LOC_TIME D_AGE D_HH_SIZE
1 1 2 0 0 1987 35 35 4
2 2 2 2 0 1990 32 32 12
3 3 0 1 0 1992 8 30 7
4 4 2 0 1 1982 32 40 4
5 5 2 2 1 1976 45 46 4
6 6 2 0 1 1990 8 32 6
7 7 2 1 1 1997 NA 25 4
8 8 2 1 1 1985 10 37 4
9 9 2 1 1 1982 40 40 6
10 10 2 0 0 1979 43 49 3
11 11 2 0 1 1992 21 30 10
12 12 2 1 0 1996 26 26 17
13 13 2 1 1 1998 24 24 6
14 15 2 2 0 1997 13 25 5
15 16 2 1 1 1990 14 32 5
16 17 2 0 1 1988 34 34 6
17 18 2 0 1 1978 NA 44 NA
18 19 2 0 1 1991 10 31 4
19 20 2 0 1 1992 11 30 10
20 21 2 1 0 1985 34 37 4
D_CHLD HLTH_SMK SES_SC_Total SEASON W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES
1 0 0 149 1 1 2 0 0
2 2 1 196 1 1 0 0 0
3 2 0 52 1 1 0 0 0
4 1 0 214 1 1 0 0 0
5 NA 0 117 1 1 2 1 1
6 1 NA 220 1 1 1 1 2
7 2 NA 130 1 1 0 1 1
8 2 0 NA 1 1 2 2 2
9 4 0 71 1 1 0 2 0
10 1 0 117 1 1 2 2 2
11 2 0 NA 1 1 1 0 0
12 2 0 NA 1 1 1 1 0
13 2 0 159 1 1 1 1 0
14 1 1 113 1 1 0 0 1
15 NA 0 104 1 1 3 2 2
16 3 0 127 1 1 1 1 2
17 NA NA NA 1 1 0 1 1
18 2 0 110 1 1 1 1 2
19 2 0 199 1 1 2 2 1
20 2 0 86 1 1 0 0 0
HW_PLANS HW_FOOD HW_HANDS HW_BODY HW_DRINK HW_ANGRY HW_SLEEP HW_NONE
1 0 0 0 0 0 1 0 1
2 0 0 0 1 0 2 0 2
3 0 0 0 0 0 1 0 0
4 0 0 0 0 0 1 0 0
5 1 1 0 1 0 1 0 3
6 1 3 1 1 0 1 0 2
7 1 1 1 1 0 1 0 1
8 0 0 2 0 0 0 0 0
9 0 0 0 0 0 1 0 0
10 0 0 0 0 0 0 0 0
11 1 1 0 2 0 3 0 3
12 0 0 0 0 0 1 0 1
13 1 0 0 0 0 0 1 1
14 0 0 0 0 0 0 0 0
15 0 1 0 3 0 3 0 3
16 1 0 0 1 0 0 0 0
17 0 0 0 0 1 2 1 1
18 1 0 0 1 0 0 0 2
19 2 1 0 2 1 2 0 0
20 0 0 0 0 0 1 0 0
HW_SHAME PSS1 PSS2 PSS3 PSS4 PSS5 PSS6 PSS7 PSS8 PSS9 PSS10 PSS11 PSS12
1 0 2 3 2 4 3 2 2 1 2 3 1 4
2 0 2 2 3 3 3 4 3 2 3 2 2 4
3 0 2 2 2 2 2 3 3 2 3 2 2 2
4 1 2 3 2 3 3 3 3 1 3 2 2 2
5 1 2 2 3 2 2 2 3 2 2 1 3 3
6 2 2 2 3 2 3 3 4 1 3 3 2 3
7 1 2 2 2 2 3 2 2 2 2 2 2 2
8 0 2 1 3 3 3 2 3 1 3 3 4 2
9 0 2 3 2 3 2 2 2 3 3 2 2 2
10 0 1 2 1 3 3 3 3 2 3 0 2 3
11 0 3 3 4 4 2 1 2 3 3 1 4 3
12 1 3 4 4 1 2 2 2 2 2 1 4 3
13 0 4 3 4 4 2 3 4 2 1 1 3 3
14 0 3 3 3 2 2 2 1 2 1 1 3 3
15 2 3 3 2 2 1 3 3 1 2 2 2 3
16 0 2 0 2 3 3 3 3 2 3 2 2 3
17 0 2 1 2 3 2 2 2 3 2 2 3 3
18 0 1 3 3 3 2 2 2 3 2 3 3 3
19 1 1 2 4 2 1 3 3 2 3 2 3 3
20 0 2 2 2 3 2 4 3 2 3 2 2 2
PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT HW_TOTAL W_WC_WI HRS_WEEK PSS_TOTAL
1 2 1 0 1 4 1 168.00 -4
2 3 2 0 0 5 0 35.00 -4
3 3 2 0 0 1 1 NA -4
4 3 1 0 0 2 1 28.00 -7
5 2 3 1 0 12 1 156.00 4
6 1 2 0 0 15 1 12.00 -4
7 2 2 1 0 9 1 11.00 -1
8 2 2 0 1 8 1 39.00 -4
9 2 2 1 0 3 1 5.25 0
10 3 2 0 0 6 1 24.50 -5
11 4 1 1 0 11 1 20.00 4
12 2 4 0 0 5 1 77.00 12
13 2 3 0 0 5 1 21.00 5
14 2 3 0 0 1 1 84.00 9
15 2 4 0 0 19 1 35.00 3
16 3 3 0 0 6 1 9.00 -6
17 1 2 1 0 7 1 17.50 2
18 3 2 1 0 8 1 8.00 1
19 2 2 0 0 14 1 42.00 1
20 2 2 0 0 1 1 21.00 -5
MX9_DRINK_W
1 <NA>
2 a) agua del suministro publico,E) Garrafon de purificadora
3 E) Garrafon de purificadora
4 E) Garrafon de purificadora
5 E) Garrafon de purificadora
6 E) Garrafon de purificadora,G) Botellas de Agua
7 E) Garrafon de purificadora
8 E) Garrafon de purificadora,G) Botellas de Agua
9 a) agua del suministro publico,E) Garrafon de purificadora
10 E) Garrafon de purificadora,G) Botellas de Agua
11 E) Garrafon de purificadora,G) Botellas de Agua
12 E) Garrafon de purificadora
13 E) Garrafon de purificadora,G) Botellas de Agua
14 E) Garrafon de purificadora
15 E) Garrafon de purificadora
16 a) agua del suministro publico,E) Garrafon de purificadora
17 d) Garrafon de marca,E) Garrafon de purificadora
18 E) Garrafon de purificadora,G) Botellas de Agua
19 E) Garrafon de purificadora
20 E) Garrafon de purificadora
MX10_WSTORAGE
1 <NA>
2 B) Tinaco,C) Cisterna,D) Tambo
3 B) Tinaco,C) Cisterna
4 B) Tinaco,C) Cisterna
5 C) Cisterna,D) Tambo
6 B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa
7 B) Tinaco,C) Cisterna,E) Cubeta con tapa
8 B) Tinaco,pileta
9 B) Tinaco,C) Cisterna,D) Tambo,E) Cubeta con tapa
10 B) Tinaco,C) Cisterna
11 B) Tinaco,C) Cisterna,D) Tambo
12 B) Tinaco,C) Cisterna,D) Tambo,F) Cubeta sin tapa
13 B) Tinaco,C) Cisterna,D) Tambo
14 B) Tinaco,D) Tambo
15 D) Tambo,E) Cubeta con tapa
16 B) Tinaco,C) Cisterna,D) Tambo
17 B) Tinaco
18 D) Tambo,F) Cubeta sin tapa
19 C) Cisterna,D) Tambo,F) Cubeta sin tapa
20 B) Tinaco
summary(merged_df)
ID MX8_TRUST MX28_WQ_COMP MX26_EM_HHW_TYPE
Min. : 1.0 Length:402 Min. :0.0000 Length:402
1st Qu.:101.2 Class :character 1st Qu.:0.0000 Class :character
Median :201.5 Mode :character Median :1.0000 Mode :character
Mean :248.3 Mean :0.8677
3rd Qu.:396.8 3rd Qu.:1.0000
Max. :497.0 Max. :2.0000
NA's :9
D_YRBR D_LOC_TIME D_AGE D_HH_SIZE
Min. :1976 Min. : 2.00 Min. :18.00 Min. : 2.000
1st Qu.:1984 1st Qu.:13.00 1st Qu.:26.00 1st Qu.: 4.000
Median :1990 Median :23.00 Median :32.00 Median : 5.000
Mean :1990 Mean :22.46 Mean :32.11 Mean : 5.578
3rd Qu.:1996 3rd Qu.:31.00 3rd Qu.:38.00 3rd Qu.: 6.000
Max. :2005 Max. :46.00 Max. :49.00 Max. :40.000
NA's :17 NA's :36 NA's :18 NA's :23
D_CHLD HLTH_SMK SES_SC_Total SEASON
Min. :0.000 Min. :0.0000 Min. : 25.0 Min. :0.0000
1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:104.2 1st Qu.:0.0000
Median :2.000 Median :0.0000 Median :129.5 Median :1.0000
Mean :1.857 Mean :0.1698 Mean :133.2 Mean :0.5088
3rd Qu.:3.000 3rd Qu.:0.0000 3rd Qu.:159.0 3rd Qu.:1.0000
Max. :8.000 Max. :1.0000 Max. :263.0 Max. :1.0000
NA's :24 NA's :78 NA's :52 NA's :3
W_WS_LOC HW_WORRY HW_INTERR HW_CLOTHES
Min. :0.0000 Min. :0.00 Min. :0.000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.0000
Median :1.0000 Median :1.00 Median :1.000 Median :1.0000
Mean :0.5063 Mean :1.16 Mean :1.316 Mean :0.9874
3rd Qu.:1.0000 3rd Qu.:2.00 3rd Qu.:2.000 3rd Qu.:2.0000
Max. :1.0000 Max. :3.00 Max. :3.000 Max. :3.0000
NA's :3 NA's :3 NA's :3 NA's :4
HW_PLANS HW_FOOD HW_HANDS HW_BODY
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.8571 Mean :0.4422 Mean :0.2682 Mean :0.6792
3rd Qu.:2.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :3.0000 Max. :3.0000 Max. :3.0000 Max. :3.0000
NA's :3 NA's :4 NA's :3 NA's :3
HW_DRINK HW_ANGRY HW_SLEEP HW_NONE
Min. :0.0000 Min. :0.00 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :1.00 Median :0.0000 Median :0.0000
Mean :0.4261 Mean :1.04 Mean :0.2638 Mean :0.7513
3rd Qu.:1.0000 3rd Qu.:2.00 3rd Qu.:0.0000 3rd Qu.:1.0000
Max. :3.0000 Max. :3.00 Max. :2.0000 Max. :3.0000
NA's :3 NA's :4 NA's :4 NA's :4
HW_SHAME PSS1 PSS2 PSS3
Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.00
1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.00
Median :0.0000 Median :2.000 Median :2.000 Median :3.00
Mean :0.2197 Mean :2.111 Mean :2.219 Mean :2.93
3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.00
Max. :3.0000 Max. :4.000 Max. :4.000 Max. :4.00
NA's :6 NA's :4 NA's :4 NA's :4
PSS4 PSS5 PSS6 PSS7 PSS8
Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:2.000 1st Qu.:2.000
Median :3.000 Median :3.000 Median :3.00 Median :3.000 Median :2.000
Mean :2.573 Mean :2.664 Mean :2.83 Mean :2.578 Mean :2.133
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.000 Max. :4.000
NA's :6 NA's :3 NA's :3 NA's :4 NA's :4
PSS9 PSS10 PSS11 PSS12
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:2.000
Median :3.000 Median :2.000 Median :3.000 Median :3.000
Mean :2.653 Mean :1.992 Mean :2.565 Mean :2.957
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000
Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
NA's :4 NA's :4 NA's :4 NA's :4
PSS13 PSS14 HLTH_CPAIN_CAT HLTH_CDIS_CAT
Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :3.000 Median :2.000 Median :0.0000 Median :0.0000
Mean :2.547 Mean :2.168 Mean :0.2663 Mean :0.1646
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :4.000 Max. :4.000 Max. :1.0000 Max. :1.0000
NA's :5 NA's :4 NA's :4 NA's :1
HW_TOTAL W_WC_WI HRS_WEEK PSS_TOTAL
Min. : 0.000 Min. :0.0000 Min. : 0.00 Min. :-19.0000
1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.: 21.00 1st Qu.: -6.0000
Median : 8.000 Median :1.0000 Median : 42.00 Median : 0.0000
Mean : 8.419 Mean :0.7158 Mean : 85.13 Mean : -0.7215
3rd Qu.:12.000 3rd Qu.:1.0000 3rd Qu.:168.00 3rd Qu.: 4.0000
Max. :27.000 Max. :1.0000 Max. :168.00 Max. : 19.0000
NA's :11 NA's :22 NA's :39 NA's :7
MX9_DRINK_W MX10_WSTORAGE
Length:402 Length:402
Class :character Class :character
Mode :character Mode :character
dim(merged_df)
[1] 402 47
# Save the merged dataset
write.csv(merged_df, file.path(data_path, "Cleaned_Dataset_Screening_HWISE_PSS_V4.csv"), row.names = FALSE)
cat("Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V4.csv'.\n")
Merged dataset saved as 'Cleaned_Dataset_Screening_HWISE_PSS_V4.csv'.
sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Detroit
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_1.3.1 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] jsonlite_1.8.9 compiler_4.4.3 promises_1.3.0 tidyselect_1.2.1
[5] Rcpp_1.0.13-1 stringr_1.5.1 git2r_0.35.0 later_1.3.2
[9] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0 R6_2.5.1
[13] generics_0.1.3 workflowr_1.7.1 knitr_1.49 tibble_3.2.1
[17] rprojroot_2.0.4 bslib_0.8.0 pillar_1.9.0 rlang_1.1.4
[21] utf8_1.2.4 cachem_1.1.0 stringi_1.8.4 httpuv_1.6.15
[25] xfun_0.49 fs_1.6.5 sass_0.4.9 cli_3.6.3
[29] withr_3.0.2 magrittr_2.0.3 digest_0.6.37 rstudioapi_0.17.1
[33] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.1 glue_1.8.0
[37] whisker_0.4.1 fansi_1.0.6 purrr_1.0.2 rmarkdown_2.29
[41] tools_4.4.3 pkgconfig_2.0.3 htmltools_0.5.8.1