Last updated: 2020-10-24
Checks: 7 0
Knit directory: ebpmf_data_analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200511)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version fc174dc. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/ebpmf_bg_tutorial_cache/
Ignored: analysis/ebpmf_wbg_model_intro_cache/
Ignored: analysis/ebpmf_wbg_simulate_big_data2_cache/
Ignored: analysis/ebpmf_wbg_simulate_big_data_cache/
Ignored: analysis/ebpmf_wbg_simulation_cache/
Ignored: analysis/investigate_np_ebpmf_wbg_cache/
Ignored: analysis/pmf_greedy_experiment_cache/
Ignored: analysis/sla_data_analysis_k10_cache/
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Ignored: topicView-app/.DS_Store
Untracked files:
Untracked: analysis/draft.Rmd
Untracked: analysis/ebpmf_wbg_simulation_big.Rmd
Untracked: analysis/heatmap.Rmd
Untracked: analysis/investigate_largeK.Rmd
Untracked: analysis/investigate_news_topics.Rmd
Untracked: analysis/summary_sla_news_nips.Rmd
Untracked: analysis/test.R
Untracked: script/Rplots.pdf
Untracked: script/save_volcano_plot.R
Untracked: topicView-app/app_utils.R
Untracked: topicView-app/data/
Untracked: topicView-app/output/
Untracked: topicView-app/rsconnect/
Unstaged changes:
Modified: analysis/ebpmf_wbg_simulate_big_data2.Rmd
Deleted: analysis/sla_data_analysis_k10.Rmd
Deleted: analysis/sla_data_analysis_k5.Rmd
Deleted: analysis/sla_data_analysis_k50.Rmd
Modified: code/util.R
Deleted: data/SLA/SCC2016/Code/APL/compCM.m
Deleted: data/SLA/SCC2016/Code/APL/compMuI.m
Deleted: data/SLA/SCC2016/Code/APL/compParamErr2.m
Deleted: data/SLA/SCC2016/Code/APL/cpl4c.m
Deleted: data/SLA/SCC2016/Code/APL/cplEstimParam.m
Deleted: data/SLA/SCC2016/Code/APL/cpl_basic_demo_PJ.m
Deleted: data/SLA/SCC2016/Code/APL/cpl_demo.m
Deleted: data/SLA/SCC2016/Code/APL/cpl_demo2a.m
Deleted: data/SLA/SCC2016/Code/APL/dcBlkMod.m
Deleted: data/SLA/SCC2016/Code/APL/dcBlkMod2.m
Deleted: data/SLA/SCC2016/Code/APL/dcBlkMod3.m
Deleted: data/SLA/SCC2016/Code/APL/dcbm_nmi_beta_D.m
Deleted: data/SLA/SCC2016/Code/APL/dcbm_nmi_lambda_D.m
Deleted: data/SLA/SCC2016/Code/APL/dcbm_time_vs_n_D.m
Deleted: data/SLA/SCC2016/Code/APL/genDCBlkMod.c
Deleted: data/SLA/SCC2016/Code/APL/genDCBlkMod.mexa64
Deleted: data/SLA/SCC2016/Code/APL/genDCBlkMod2.m
Deleted: data/SLA/SCC2016/Code/APL/initLabel5b.m
Deleted: data/SLA/SCC2016/Code/BCPL/ProfileLike.m
Deleted: data/SLA/SCC2016/Code/BCPL/calCri1.m
Deleted: data/SLA/SCC2016/Code/BCPL/calCri2.m
Deleted: data/SLA/SCC2016/Code/BCPL/mutiExp.m
Deleted: data/SLA/SCC2016/Code/MatlabCode.m
Deleted: data/SLA/SCC2016/Code/NewmanSM/NewmanSM.m
Deleted: data/SLA/SCC2016/Code/coauthorThresh2GiantAdj.txt
Deleted: data/SLA/SCC2016/Code/coauthorThresh2GiantCommLabelK2Matlab.txt
Deleted: data/SLA/SCC2016/Code/functions.R
Deleted: data/SLA/SCC2016/Code/main.R
Deleted: data/SLA/SCC2016/Data/authorList.txt
Deleted: data/SLA/SCC2016/Data/authorPaperBiadj.txt
Deleted: data/SLA/SCC2016/Data/paperCitAdj.txt
Deleted: data/SLA/SCC2016/Data/paperList.txt
Deleted: data/SLA/SCC2016/ReadMe.txt
Deleted: data/uci_BoW.sh
Deleted: data/uci_BoW/docword.kos.txt
Deleted: data/uci_BoW/readme.txt
Deleted: data/uci_BoW/vocab.kos.txt
Deleted: output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000.Rout
Deleted: output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000_from_truth.Rout
Deleted: output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter3.Rds
Deleted: output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000.Rds
Deleted: output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000_from_truth.Rds
Deleted: output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter50_from_truth2.Rds
Deleted: output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_1000.Rout
Deleted: output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_500.Rout
Deleted: output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.8/kos_ebpmf_bg_K2_maxiter10.Rds
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K300_maxiter_1000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K500_maxiter_1000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K300_maxiter_1000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K500_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter10.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter100.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter200.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter300.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter400.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter600.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter700.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter800.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter900.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_init_nmf_K100_iter50.Rds
Deleted: output/uci_BoW/v0.3.9/kos_init_nmf_K20_iter50.Rds
Deleted: output/uci_BoW/v0.3.9/kos_init_nmf_K300_iter50.Rds
Deleted: output/uci_BoW/v0.3.9/kos_init_nmf_K500_iter50.Rds
Deleted: output/uci_BoW/v0.3.9/kos_init_nmf_K50_iter50.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter10.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter100.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter200.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter300.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter400.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter600.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter700.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter800.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter900.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4000.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter500.Rds
Deleted: output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter5000.Rds
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter10.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter5000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter10.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter5000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K3_maxiter10.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter5000.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K100_iter50.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K20_iter50.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K300_iter50.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K3_iter50.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K500_iter50.Rds
Deleted: output/uci_BoW/v0.4.2/kos_init_nmf_K50_iter50.Rds
Deleted: output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
Deleted: output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K100_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K20_maxiter500.Rds
Deleted: output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K50_maxiter500.Rds
Modified: topicView-app/app.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ebpmf_wbg_simulation_big2.Rmd
) and HTML (docs/ebpmf_wbg_simulation_big2.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | fc174dc | zihao12 | 2020-10-24 | update ebpmf_wbg_simulation_big2.Rmd |
pmf
fit (50 iterations). Call them model_from_truth
and model_from_pmf
respectively.model_from_truth
gets good results. The prior makes sense.model_from_pmf
is bad. It has way lower ELBO but even higher expected loglikelihood (which suggests issue with pmf fit maybe), which means very high (bad) KL. It does not learn the structure right.g
in model_from_truth
(missed placing two very big phi) but the result seems okay.knitr::opts_chunk$set(message = FALSE, warning = FALSE, autodep = TRUE)
library(ggplot2)
library(Matrix)
source("code/misc.R")
source("code/util.R")
model_from_truth = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter2000_from_truth.Rds")
model_from_pmf = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter2000.Rds")
model_from_pmf_iter1 = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter1.Rds")
truth = readRDS("output/sim/v0.4.5/exper2/truth.sim_bg_block_n1100_p2100_K50.Rds")
init = readRDS("output/sim/v0.4.5/exper2/init.sim_bg_block_n1100_p2100_K50.Rds")
X = read_sim_bag_of_words("output/sim/v0.4.5/exper2/docword.sim_bg_block_n1100_p2100_K50.txt")
See what the data looks like
par(mfrow = c(1,2))
pheatmap(as.matrix(X)[1:100, 1:200], cluster_rows=FALSE, cluster_cols=FALSE,
main = "X block")
pheatmap(truth$L[1:100,] %*% t(truth$F[1:200, ]), cluster_rows=FALSE, cluster_cols=FALSE,
main = "deviation matrix block")
progress_df = data.frame(elbo_from_truth = model_from_truth$ELBO,
Eloglik_from_truth = model_from_truth$ELBO + model_from_truth$KL,
elbo_from_pmf = model_from_pmf$ELBO,
Eloglik_from_pmf = model_from_pmf$ELBO + model_from_pmf$KL,
iter = 1:length(model_from_pmf$ELBO))
plt = ggplot(data = progress_df)+
geom_line(aes(iter, elbo_from_truth, color = "elbo_from_truth")) +
geom_line(aes(iter, Eloglik_from_truth, color = "Eloglik_from_truth")) +
geom_line(aes(iter, elbo_from_pmf, color = "elbo_from_pmf")) +
geom_line(aes(iter, Eloglik_from_pmf, color = "Eloglik_from_pmf")) +
ylab("progress")
print(plt)
progress_df[length(model_from_pmf$ELBO),]
elbo_from_truth Eloglik_from_truth elbo_from_pmf Eloglik_from_pmf iter
2000 -370589.4 -354103.4 -443463.1 -339576.2 2000
model_from_pmf
is much worse than model_from_truth
in ELBO, which I think is a good indicator of model performance. However, model_from_pmf
even has higher Expected Loglikelihood (ELBO = E-loglik + KL
), so its KL must be very very bad. This might suggest a very bad local optimal solution.
model_from_truth
w = model_from_truth$w
hist(length(w) * w/sum(w), main = "w scaled to have mean 1")
Not surprisingly it uncovers truth very well. Below I look at factor/loading 12.
Uncover top documents well
k = 12
n_top = ncol(truth$top_doc)
l_fitted = model_from_truth$qg$qls_mean[,k]
l_truth = truth$L[,k]
## the red lines marks 1
plot(l_truth, l_fitted)
abline(v = 1, col = "red")
sort(sort(l_fitted, index.return = TRUE, decreasing = TRUE)$ix[1:n_top])
[1] 221 222 223 224 225 226 227 228 229 230
sort(truth$top_doc[k,])
[1] 221 222 223 224 225 226 227 228 229 230
Uncover top words (identify 17 top words out of 20)
k = 12
n_top = ncol(truth$top_words)
f_fitted = model_from_truth$qg$qfs_mean[,k]
f_truth = truth$F[,k]
## the red lines marks 1
plot(f_truth, f_fitted)
abline(v = 1, col = "red")
f_fitted_sorted = sort(f_fitted, index.return = TRUE, decreasing = TRUE)
sort(f_fitted_sorted$ix[1:n_top])
[1] 441 442 443 445 446 447 449 450 452 453 454 455 456 457 458
[16] 459 460 1073 2060 2075
sort(truth$top_words[k,])
[1] 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
[20] 460
## look at the values for top words (look at the last 3 wrong ones)
f_fitted[sort(f_fitted_sorted$ix[1:n_top])]
[1] 79.045095 24.197773 73.251358 26.541303 83.586287 67.092675
[7] 119.780659 51.007861 65.587827 69.154921 86.199267 20.213477
[13] 30.458866 46.638130 115.936507 46.273014 43.845567 1.003489
[19] 1.003440 1.003590
g
They represent how many top words/top documents pretty well.
## pi = 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top doc)
Pi_L = get_prior_summary(model_from_truth$qg$gls, log10 = TRUE, return_matrix = TRUE)
## pi around 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top words)
Pi_F = get_prior_summary(model_from_truth$qg$gfs, log10 = TRUE, return_matrix = TRUE)
model_from_pmf
w
Different from truth (all 1)
model_from_pmf$w
[1] 7.344054 7.351531 7.792479 9.372576 7.904134 8.211023 6.368980
[8] 5.493236 5.939650 6.930396 6.943949 8.330629 8.334705 5.426788
[15] 7.269157 9.001968 6.606669 10.968828 7.538840 6.265577 7.287651
[22] 7.152494 8.178605 7.717499 8.361270 5.740166 5.704718 6.840604
[29] 6.543005 7.457368 6.833842 6.263324 6.631474 6.296690 8.866826
[36] 8.296635 8.935333 8.191624 5.428021 6.452632 8.921568 7.355455
[43] 6.667635 7.229011 7.034179 6.786918 8.854385 7.253310 7.719861
[50] 7.084997
w = model_from_pmf$w
w = (w/sum(w)) * length(w)
hist(w, main = "w (scaled to have mean 1)")
l0
(same situation for f0
).The initialization for l0
is good (I also checked that after the first iteration l0
is still good), but the final output is worse. So the algorithm seems to move to a bad optimal gradually.
par(mfrow = c(2,2))
plot(truth$l0, main = "l0 from truth")
## fitted l0 (Note the minimum is 1e-8)
plot(model_from_pmf$l0, main = "l0 from model_from_pmfs")
## that's what is used for initialization ((Note the minimum is 1e-8))
plot(init$ebpmf_wbg$l0, main = "l0 in initialization")
## a natural guess for l0, f0 is rank-1 fit, which is proportional to row&col mean of X
plot(rowMeans(X), main = "rowMeans(X)")
L
.The majority of L
are small numbers instead of 1. Each topic seems to capture around 20-30 top words, although obviously there is no correspondence between the fitted topics and the truth.
par(mfrow = c(2,2))
ks = c(1, 12, 23, 42)
for(k in ks){
l_fitted = model_from_pmf$qg$qls_mean[,k]
plot(l_fitted, ylab = sprintf("%dth loading", k))
print(sprintf("median of %dth loading %s", k, median(l_fitted)))
}
[1] "median of 1th loading 0.137447098615451"
[1] "median of 12th loading 0.245723105965793"
[1] "median of 23th loading 0.12774018431737"
[1] "median of 42th loading 0.199135453189951"
See how much L
changes from initialization.
par(mfrow = c(2,2))
ks = c(1, 12, 23, 42)
for(k in ks){
plot(model_from_pmf_iter1$qg$qls_mean[,k], model_from_pmf$qg$qls_mean[,k],
log = "xy", xlab = "iter1", ylab = "iterN")
}
The top words change little from the first initialization.
g
.That explains why the majority of L
are very small. The huge phi
has majority of weights, which favors smaller and bigger numbers than small phi
(which favors 1).
## pi = 0.8 for phi = 1e+8, pi = 0.13 for phi = 0.001 (truth: around 0.01 are top doc)
Pi_L = get_prior_summary(model_from_pmf$qg$gls, log10 = TRUE, return_matrix = TRUE)
## pi = 0.6 for phi = 1e+8, pi = 0.13 ~ 0.2 for phi = 0.001(truth: around 0.01 are top words)
Pi_F = get_prior_summary(model_from_pmf$qg$gfs, log10 = TRUE, return_matrix = TRUE)
model_from_pmf
fittinglam-deviation
and compare truth to the two models. (the scale of the heatmap is not clear so I show a histogram beisde it).model_from_truth
identifies most structure (probably makes a few mistakes, like when it misses 3 top wirds above)model_from_pmf
does not seem to get the structure.block_row = 1:100
block_col = 1:200
lam_devia_from_pmf = model_from_pmf$qg$qls_mean %*% t(model_from_pmf$qg$qfs_mean)
lam_devia_from_truth = model_from_truth$qg$qls_mean %*% t(model_from_truth$qg$qfs_mean)
lam_devia_truth = truth$L %*% t(truth$F)
block_k_from_pmf = lam_devia_from_pmf[block_row,block_col]
block_k_from_truth = lam_devia_from_truth[block_row,block_col]
block_k_truth = lam_devia_truth[block_row,block_col]
## I scaled the block for comparison
block_k_truth = block_k_truth/sum(block_k_truth)
block_k_from_truth = block_k_from_truth/sum(block_k_from_truth)
block_k_from_pmf = block_k_from_pmf/sum(block_k_from_pmf)
par(mfrow = c(4,2))
image(block_k_truth)
hist(log(block_k_truth))
image(block_k_from_truth)
hist(log(block_k_from_truth))
image(block_k_from_pmf)
hist(log(block_k_from_pmf))
plot(block_k_truth, block_k_from_truth, log = "xy")
plot(block_k_truth, block_k_from_pmf, log = "xy")
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.15.7
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pheatmap_1.0.12 Matrix_1.2-17 ggplot2_3.3.0 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 RColorBrewer_1.1-2 compiler_3.5.1 pillar_1.4.4
[5] later_1.1.0.1 git2r_0.26.1 tools_3.5.1 digest_0.6.25
[9] lattice_0.20-38 evaluate_0.14 lifecycle_0.2.0 tibble_3.0.1
[13] gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.6 yaml_2.2.0
[17] xfun_0.8 withr_2.2.0 stringr_1.4.0 dplyr_0.8.1
[21] knitr_1.28 fs_1.3.1 vctrs_0.3.0 rprojroot_1.3-2
[25] grid_3.5.1 tidyselect_0.2.5 glue_1.4.1 R6_2.4.1
[29] rmarkdown_2.1 farver_2.0.3 purrr_0.3.4 magrittr_1.5
[33] whisker_0.3-2 backports_1.1.7 scales_1.1.1 promises_1.1.1
[37] htmltools_0.5.0 ellipsis_0.3.1 assertthat_0.2.1 colorspace_1.4-1
[41] httpuv_1.5.4 labeling_0.3 stringi_1.4.3 munsell_0.5.0
[45] crayon_1.3.4