Last updated: 2024-07-06
Checks: 7 0
Knit directory: MATHPOP/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20240702)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version eda1cf3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: analysis/.DS_Store
Ignored: data/.DS_Store
Ignored: data/GC_prob/.DS_Store
Ignored: data/point_source_data/.DS_Store
Ignored: data/prob_GC_data/.DS_Store
Ignored: data/sim/.DS_Store
Ignored: data/v10acs/.DS_Store
Ignored: data/v10wfc3/.DS_Store
Ignored: data/v11acs/.DS_Store
Ignored: data/v11wfc3/.DS_Store
Ignored: data/v12acs/.DS_Store
Ignored: data/v12wfc3/.DS_Store
Ignored: data/v13acs/.DS_Store
Ignored: data/v13wfc3/.DS_Store
Ignored: data/v14acs/.DS_Store
Ignored: data/v14wfc3/.DS_Store
Ignored: data/v15acs/.DS_Store
Ignored: data/v15wfc3/.DS_Store
Ignored: data/v6acs/.DS_Store
Ignored: data/v6wfc3/.DS_Store
Ignored: data/v7acs/.DS_Store
Ignored: data/v8acs/.DS_Store
Ignored: data/v9acs/.DS_Store
Ignored: data/v9wfc3/.DS_Store
Untracked files:
Untracked: GC_prob-tikzDictionary
Untracked: analysis/GC_prob-tikzDictionary
Untracked: analysis/data_process-tikzDictionary
Untracked: analysis/method_compare-tikzDictionary
Untracked: analysis/posterior-tikzDictionary
Untracked: analysis/vignette-tikzDictionary
Untracked: data_process-tikzDictionary
Untracked: method_compare-tikzDictionary
Untracked: posterior-tikzDictionary
Untracked: vignette-tikzDictionary
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/data_process.Rmd
) and HTML
(docs/data_process.html
) files. If you’ve configured a
remote Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 0b7fd29 | david.li | 2024-07-06 | wflow_git_commit(all = T) |
html | 0a9d403 | david.li | 2024-07-05 | Build site. |
html | c1fd386 | david.li | 2024-07-04 | Build site. |
Rmd | 33fc566 | david.li | 2024-07-04 | wflow_git_commit(all = T) |
html | 33fc566 | david.li | 2024-07-04 | wflow_git_commit(all = T) |
html | 6ff1ccc | david.li | 2024-07-04 | Build site. |
html | 65a7d8c | david.li | 2024-07-04 | Build site. |
Rmd | 98834e5 | david.li | 2024-07-04 | wflow_publish(c("analysis/_site.yml", "analysis/posterior.Rmd", |
This vignette illustrates the methods used to obtain the probabilistic catalogs obtained using point sources from DOLPHOT and SExtractor. It also reproduces Figure 1 and Figure 8 in the original paper of Li et al. (2024)
Read in the point source data obtained by DOLPHOT and SExtractor:
# Load packages
library(tidyverse)
library(tikzDevice)
library(mixtools)
library(Rcpp)
sourceCpp('code/cpp_help_func.cpp')
source('code/help_functions.R')
# Read in DOLPHOT point source data
fnames <- list.files('data/point_source_data/')
ps_harris_fnames <- fnames[!grepl('Jans', fnames)]
dat_harris <- data.frame()
for (i in ps_harris_fnames) {
df <- read_csv(paste0('data/point_source_data/',i))
df <- bind_cols(df, field = rep(i, nrow(df)))
dat_harris <- bind_rows(dat_harris, df)
}
# Read in SExtractor point source data
ps_Jans_fnames <- fnames[grepl('Jans', fnames)]
dat_Jans <- read_csv(paste0('data/point_source_data/', ps_Jans_fnames))
We then carry out pre-processing of the point source data. For the DOLPHOT data, we remove point sources with color \(\mathrm{F475W} - \mathrm{F814W} < 0.0~\mathrm{mag}\) and \(\mathrm{F814W} < 22.0~\mathrm{mag}\). Pre-processing of SExtractor data is not needed as the method to obtain the probabilistic GC catalog under SExtractor point source is different.
CMD_harris <- mutate(dat_harris, C = F475W - F814W, M = F814W) %>%
dplyr::select(x, y, RA, DEC, C, M, F814W, F475W, field) %>%
filter(M > 22 & C > 0)
For the DOLPHOT point source, the finite-mixture method to obtain the
probabilistic GC catalog is illustrated here.
We have carried out this method 500 times where each time the color and
magnitude are jittered with their measurement uncertainties. As it is
impossible to run and save such a big computational task remotely, we
have saved the locally-run results in the file
DOL_GC_prob_mvnpEM_500_iter.RDS
under the
data/GC_prob/
folder of the Github repo. Read in this data
and process it:
# read in results from 500 iteration of the finite-mixture method
GC_prob_ls_Harris <- readRDS('data/GC_prob/DOL_GC_prob_mvnpEM_500_iter.RDS')
# extract the probability a source is a GC
GC_probability_Harris <- do.call(cbind,lapply(GC_prob_ls_Harris, function(x) x$posterior[, which.max(x$lambdahat)]))
# name the column
colnames(GC_probability_Harris) <- c(paste0('p', 1:500))
# add it to the previous point source data.frame
CMD_harris <- bind_cols(CMD_harris, GC_probability_Harris)
# obtain summary statistics of the probability from the 500 iterations
CMD_harris$p <- rowMeans(GC_probability_Harris)
CMD_harris$p_sd <- apply(GC_probability_Harris, 1, sd)
After obtaining the point source data with all 500 probabilities, we save them into individual files based on their imaging field ID for later inference.
# Only run once
for (i in ps_harris_fnames) {
write_csv(filter(CMD_harris, field == i), paste0('data/prob_GC_data/',unlist(strsplit(i, split = '.', fixed = T))[1],'_pGC.csv'))
}
For the SExtractor data, a different parametric finite-mixture model
was used. For a detailed explanation of the method, see Appendix B.2 of
the original Li et al. (2024) paper. We have written a function called
mix_func()
in the code/help_functions.R
to fit
the parametric finite-mixture model.
Similar to DOLPHOT data, we apply our proposed parametric
finite-mixture model to the measurement uncertainty jittered
color-magnitude data for 500 times, this is carried out by the function
meas_uncertain_mix_func()
:
# This may take a while, but still much faster than the previous non-parametric finite mixture model used for DOLPHOT
Jans_res <- meas_uncertain_mix_func(dat = dat_Jans, n_iter = 500)
Get the results from the previous run, and attach it to the
SExtractor point source data.frame
.
# get the probabilities
p_mat <- Jans_res$prob
# summary statistics for the probability a point source is a GC
dat_Jans$p <- rowMeans(Jans_res$prob)
dat_Jans$p_sd <- apply(Jans_res$prob, 1, sd)
# restructure the data frame for later inference
dat_Jans_save <- dat_Jans %>%
dplyr::select(x, y, RA, DEC, C, F814W, field) %>%
mutate(M = F814W) %>%
dplyr::select(x, y, RA, DEC, C, M, field)
colnames(Jans_res$prob) <- c(paste0('p', 1:500))
Jans_res$prob <- data.frame(Jans_res$prob)
dat_Jans_save <- bind_cols(dat_Jans_save, p_mat)
Again, we save the probabilistic GC catalog from SExtractor into individual files based on their imaging field ID for later inference.
# save the probabilistic GC catalog into individual files (only run once)
for (i in unique(dat_Jans_save$field)) {
write_csv(filter(dat_Jans_save, field == i), paste0('data/prob_GC_data/', i,'_pGC_Jans.csv'))
}
Now we can plot the obtained data and regenerate Figure 1 in the original Li et al. (2024) paper. Note that due to Latex memory constraints, not all point sources can be plotted in the figure, so we only plotted half of the sources.
# combine the two point source data in one data frame
frac_CMD_harris <- CMD_harris %>%
dplyr::select(C, M, p, p_sd) %>%
mutate(data = 'DOLPHOT')
frac_CMD_Jans <- dat_Jans %>%
dplyr::select(C, F814W, p, p_sd) %>%
mutate(data = 'SExtractor (J24)', M = F814W) %>%
dplyr::select(C, M, p, p_sd, data)
# select half of the sources to be plotted
set.seed(123456)
frac_CMD <- bind_rows(frac_CMD_harris[sample(1:nrow(frac_CMD_harris), 0.5*nrow(frac_CMD_harris), replace = F),],
frac_CMD_Jans[sample(1:nrow(frac_CMD_Jans), 0.5*nrow(frac_CMD_Jans), replace = F),])
ggplot(frac_CMD, aes(C, M, color = p)) + geom_point(aes(size = p_sd*2.5)) +
scale_size_identity() +
scale_y_reverse() + theme_minimal() +
xlab('F475W - F814W') + ylab('F814W') +
scale_colour_viridis_c(name = 'p(GC)') +
geom_segment(data = data.frame(xmin = c(0.8, 0.8, 2.4, 0.8),
xmax = c(2.4, 0.8, 2.4, 2.4),
ymin = c(26.3, 22, 22, 22),
ymax = c(26.3, 26.3, 26.3, 22),
data = 'SExtractor (J24)'),
aes(x = xmin, xend = xmax, y = ymin, yend = ymax), colour = 'red') +
geom_segment(data = data.frame(xmin = c(1, 1, 2.4, 1),
xmax = c(2.4, 1, 2.4, 2.4),
ymin = c(26.5, 22, 22, 22),
ymax = c(26.5, 26.5, 26.5, 22),
data = 'DOLPHOT'),
aes(x = xmin, xend = xmax, y = ymin, yend = ymax), colour = 'red') +
facet_wrap(.~data)
We can now also plot the results for the fitted parametric finite-mixture model for SExtractor point source data. These are effectively Figure 8(a) and 8(b) in the paper. First, get the estimated parameters from the fitted results:
# average of the fitted mixture model model parameters
w <- mean(Jans_res$par[,1])
wr <- mean(Jans_res$par[,2])
GCLF_TO <- 26.3 # canonical GCLF TO
GCLF_sig <- mean(Jans_res$par[,3])
GC_color_mu_r <- mean(Jans_res$par[,4])
GC_color_sig_r <- mean(Jans_res$par[,5])
GC_color_mu_b <- mean(Jans_res$par[,6])
GC_color_sig_b <- mean(Jans_res$par[,7])
mu <- mean(Jans_res$par[,8])
sigma <- mean(Jans_res$par[,9])
Phi <- Phi_f_cpp(26.69, GCLF_TO, GCLF_sig, a = 6.56)
Generate Figure 8a in the original paper of Li et al. (2024):
ggplot(Jans_res$sim, aes(F814W)) + geom_histogram(aes(y = after_stat(count)/sum(count)/.05253/2),breaks = seq(22, 27.2, length = 50), alpha = 0.5) +
geom_function(fun = function(x) w*dnorm(x, GCLF_TO, GCLF_sig)*f_cpp(x, 26.69, 6.56)/Phi + (1-w)*dnorm(x, mu, sigma)) +
geom_function(fun = function(x) w*dnorm(x, GCLF_TO, GCLF_sig)*f_cpp(x, 26.69, 6.56)/Phi, color = "#EBCC2A") +
geom_function(fun = function(x) (1-w)*dnorm(x, mu, sigma), color = 'purple') + ylab('Density') + theme_minimal()
Generate Figure 8b in the original paper of Li et al. (2024):
ggplot(Jans_res$sim, aes(C)) + geom_histogram(aes(y = after_stat(count)/sum(count)/0.032),breaks = seq(0.8, 2.4, length = 50), alpha = 0.5) +
geom_function(fun = function(x) w*(wr*dnorm(x, GC_color_mu_r, GC_color_sig_r) + (1-wr)*dnorm(x, GC_color_mu_b, GC_color_sig_b)) + (1-w)*dunif(x, 0.8, 2.4)) +
geom_function(fun = function(x) w*(wr*dnorm(x, GC_color_mu_r, GC_color_sig_r) + (1-wr)*dnorm(x, GC_color_mu_b, GC_color_sig_b)), color = "#EBCC2A") +
geom_function(fun = function(x) w*wr*dnorm(x, GC_color_mu_r, GC_color_sig_r), color = '#3B9AB2') +
geom_function(fun = function(x) w*(1-wr)*dnorm(x, GC_color_mu_b, GC_color_sig_b), color = "#FF0000") +
geom_function(fun = function(x) (1-w)*dunif(x, 0.8, 2.4), color = 'purple') + ylab('Density') + theme_minimal() + xlab('F475W - F814W')
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Toronto
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcpp_1.0.11 mixtools_2.0.0 tikzDevice_0.12.5 lubridate_1.9.3
[5] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[9] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4
[13] tidyverse_2.0.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 filehash_2.4-5 viridisLite_0.4.2
[4] farver_2.1.1 fastmap_1.2.0 lazyeval_0.2.2
[7] promises_1.2.1 digest_0.6.36 timechange_0.2.0
[10] lifecycle_1.0.4 qpdf_1.3.3 survival_3.5-7
[13] processx_3.8.2 magrittr_2.0.3 kernlab_0.9-32
[16] compiler_4.3.2 rlang_1.1.4 sass_0.4.7
[19] tools_4.3.2 utf8_1.2.4 yaml_2.3.7
[22] data.table_1.14.8 knitr_1.45 askpass_1.2.0
[25] labeling_0.4.3 htmlwidgets_1.6.2 bit_4.0.5
[28] klippy_0.0.0.9500 withr_2.5.2 grid_4.3.2
[31] fansi_1.0.6 git2r_0.33.0 colorspace_2.1-0
[34] scales_1.3.0 MASS_7.3-60 tinytex_0.48
[37] cli_3.6.1 rmarkdown_2.25 crayon_1.5.2
[40] generics_0.1.3 rstudioapi_0.15.0 httr_1.4.7
[43] tzdb_0.4.0 RcppArmadillo_0.12.6.6.0 cachem_1.0.8
[46] splines_4.3.2 assertthat_0.2.1 parallel_4.3.2
[49] vctrs_0.6.5 Matrix_1.6-3 jsonlite_1.8.7
[52] callr_3.7.3 hms_1.1.3 bit64_4.0.5
[55] magick_2.8.3 plotly_4.10.3 jquerylib_0.1.4
[58] glue_1.6.2 ps_1.7.5 stringi_1.8.4
[61] gtable_0.3.4 later_1.3.1 munsell_0.5.0
[64] pillar_1.9.0 htmltools_0.5.8.1 R6_2.5.1
[67] rprojroot_2.0.4 vroom_1.6.4 evaluate_0.23
[70] lattice_0.22-5 highr_0.10 segmented_2.0-3
[73] httpuv_1.6.12 bslib_0.5.1 nlme_3.1-163
[76] whisker_0.4.1 xfun_0.41 fs_1.6.3
[79] getPass_0.2-4 pkgconfig_2.0.3 pdftools_3.4.0