Last updated: 2020-10-24

Checks: 7 0

Knit directory: ebpmf_data_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200511) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version fc174dc. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/ebpmf_bg_tutorial_cache/
    Ignored:    analysis/ebpmf_wbg_model_intro_cache/
    Ignored:    analysis/ebpmf_wbg_simulate_big_data2_cache/
    Ignored:    analysis/ebpmf_wbg_simulate_big_data_cache/
    Ignored:    analysis/ebpmf_wbg_simulation_cache/
    Ignored:    analysis/investigate_np_ebpmf_wbg_cache/
    Ignored:    analysis/pmf_greedy_experiment_cache/
    Ignored:    analysis/sla_data_analysis_k10_cache/
    Ignored:    data/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    topicView-app/.DS_Store

Untracked files:
    Untracked:  analysis/draft.Rmd
    Untracked:  analysis/ebpmf_wbg_simulation_big.Rmd
    Untracked:  analysis/heatmap.Rmd
    Untracked:  analysis/investigate_largeK.Rmd
    Untracked:  analysis/investigate_news_topics.Rmd
    Untracked:  analysis/summary_sla_news_nips.Rmd
    Untracked:  analysis/test.R
    Untracked:  script/Rplots.pdf
    Untracked:  script/save_volcano_plot.R
    Untracked:  topicView-app/app_utils.R
    Untracked:  topicView-app/data/
    Untracked:  topicView-app/output/
    Untracked:  topicView-app/rsconnect/

Unstaged changes:
    Modified:   analysis/ebpmf_wbg_simulate_big_data2.Rmd
    Deleted:    analysis/sla_data_analysis_k10.Rmd
    Deleted:    analysis/sla_data_analysis_k5.Rmd
    Deleted:    analysis/sla_data_analysis_k50.Rmd
    Modified:   code/util.R
    Deleted:    data/SLA/SCC2016/Code/APL/compCM.m
    Deleted:    data/SLA/SCC2016/Code/APL/compMuI.m
    Deleted:    data/SLA/SCC2016/Code/APL/compParamErr2.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl4c.m
    Deleted:    data/SLA/SCC2016/Code/APL/cplEstimParam.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_basic_demo_PJ.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_demo.m
    Deleted:    data/SLA/SCC2016/Code/APL/cpl_demo2a.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod2.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcBlkMod3.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_nmi_beta_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_nmi_lambda_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/dcbm_time_vs_n_D.m
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod.c
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod.mexa64
    Deleted:    data/SLA/SCC2016/Code/APL/genDCBlkMod2.m
    Deleted:    data/SLA/SCC2016/Code/APL/initLabel5b.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/ProfileLike.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/calCri1.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/calCri2.m
    Deleted:    data/SLA/SCC2016/Code/BCPL/mutiExp.m
    Deleted:    data/SLA/SCC2016/Code/MatlabCode.m
    Deleted:    data/SLA/SCC2016/Code/NewmanSM/NewmanSM.m
    Deleted:    data/SLA/SCC2016/Code/coauthorThresh2GiantAdj.txt
    Deleted:    data/SLA/SCC2016/Code/coauthorThresh2GiantCommLabelK2Matlab.txt
    Deleted:    data/SLA/SCC2016/Code/functions.R
    Deleted:    data/SLA/SCC2016/Code/main.R
    Deleted:    data/SLA/SCC2016/Data/authorList.txt
    Deleted:    data/SLA/SCC2016/Data/authorPaperBiadj.txt
    Deleted:    data/SLA/SCC2016/Data/paperCitAdj.txt
    Deleted:    data/SLA/SCC2016/Data/paperList.txt
    Deleted:    data/SLA/SCC2016/ReadMe.txt
    Deleted:    data/uci_BoW.sh
    Deleted:    data/uci_BoW/docword.kos.txt
    Deleted:    data/uci_BoW/readme.txt
    Deleted:    data/uci_BoW/vocab.kos.txt
    Deleted:    output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000.Rout
    Deleted:    output/sim/v0.4.5/fit_sim_bg_block_n1100_p2100_K50_ebpmf_wbg_maxiter_5000_from_truth.Rout
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter3.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter5000_from_truth.Rds
    Deleted:    output/sim/v0.4.5/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter50_from_truth2.Rds
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_500.Rout
    Deleted:    output/uci_BoW/v0.3.8/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.8/kos_ebpmf_bg_K2_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K300_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K500_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_ebpmf_bg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K300_maxiter_1000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K500_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/fit_kos_pmf_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter100.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter200.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter300.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter400.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter600.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter700.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter800.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K300_maxiter900.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_ebpmf_bg_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K100_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K20_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K300_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K500_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_init_nmf_K50_iter50.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K100_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter100.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter200.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter300.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter400.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter600.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter700.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter800.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K300_maxiter900.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.3.9/kos_pmf_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/fit_kos_ebpmf_wbg_initL_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initL50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K20_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K3_maxiter10.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter1500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter2500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter3500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter4500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_ebpmf_wbg_initLF50_K50_maxiter5000.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K100_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K20_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K300_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K3_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K500_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.2/kos_init_nmf_K50_iter50.Rds
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K100_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K20_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/fit_kos_np_ebpmf_wbg_initLF_K50_maxiter_5000.Rout
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K100_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K20_maxiter500.Rds
    Deleted:    output/uci_BoW/v0.4.4/kos_np_ebpmf_wbg_initLF50_K50_maxiter500.Rds
    Modified:   topicView-app/app.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/ebpmf_wbg_simulation_big2.Rmd) and HTML (docs/ebpmf_wbg_simulation_big2.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd fc174dc zihao12 2020-10-24 update ebpmf_wbg_simulation_big2.Rmd

Introduction

  • What I did:
    • I did experiment using bigger simulated dataset (\(n = 1100, p = 2100, K = 50\)). [add link to dataset]. Note that in previous experiemnt [link] the signal for top words/documents are not big enough, so even initialized with truth we cannot get any structure. This time we increase their signal.
    • I tried initializing from true \(l_0 L, f_0 F\), and from rough pmf fit (50 iterations). Call them model_from_truth and model_from_pmf respectively.
  • What I found:
    • model_from_truth gets good results. The prior makes sense.
    • model_from_pmf is bad. It has way lower ELBO but even higher expected loglikelihood (which suggests issue with pmf fit maybe), which means very high (bad) KL. It does not learn the structure right.
    • ELBO seems to be a good measurement of performance when the signal of the structure is stronger. But our algorithm cannot get to good optimal point.
  • Note there is a small bug of placing grids for g in model_from_truth (missed placing two very big phi) but the result seems okay.
knitr::opts_chunk$set(message = FALSE, warning = FALSE, autodep = TRUE)
library(ggplot2)
library(Matrix)
source("code/misc.R")
source("code/util.R")

Load model and data

model_from_truth = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter2000_from_truth.Rds")
model_from_pmf = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter2000.Rds")
model_from_pmf_iter1 = readRDS("output/sim/v0.4.5/exper2/sim_bg_block_n1100_p2100_K50_ebpmf_wbg_K50_maxiter1.Rds")
truth = readRDS("output/sim/v0.4.5/exper2/truth.sim_bg_block_n1100_p2100_K50.Rds")
init = readRDS("output/sim/v0.4.5/exper2/init.sim_bg_block_n1100_p2100_K50.Rds")
X = read_sim_bag_of_words("output/sim/v0.4.5/exper2/docword.sim_bg_block_n1100_p2100_K50.txt")

See what the data looks like

par(mfrow = c(1,2))
pheatmap(as.matrix(X)[1:100, 1:200], cluster_rows=FALSE, cluster_cols=FALSE,
         main = "X block")

pheatmap(truth$L[1:100,] %*% t(truth$F[1:200, ]), cluster_rows=FALSE, cluster_cols=FALSE,
         main = "deviation matrix block")

compare progress

progress_df = data.frame(elbo_from_truth = model_from_truth$ELBO,
                         Eloglik_from_truth = model_from_truth$ELBO + model_from_truth$KL,
                         elbo_from_pmf = model_from_pmf$ELBO,
                         Eloglik_from_pmf = model_from_pmf$ELBO + model_from_pmf$KL,
                         iter = 1:length(model_from_pmf$ELBO))

plt = ggplot(data = progress_df)+
  geom_line(aes(iter, elbo_from_truth, color = "elbo_from_truth")) +
  geom_line(aes(iter, Eloglik_from_truth, color = "Eloglik_from_truth")) +
  geom_line(aes(iter, elbo_from_pmf, color = "elbo_from_pmf")) +
  geom_line(aes(iter, Eloglik_from_pmf, color = "Eloglik_from_pmf")) +
  ylab("progress")
print(plt)

progress_df[length(model_from_pmf$ELBO),]
     elbo_from_truth Eloglik_from_truth elbo_from_pmf Eloglik_from_pmf iter
2000       -370589.4          -354103.4     -443463.1        -339576.2 2000

model_from_pmf is much worse than model_from_truth in ELBO, which I think is a good indicator of model performance. However, model_from_pmf even has higher Expected Loglikelihood (ELBO = E-loglik + KL), so its KL must be very very bad. This might suggest a very bad local optimal solution.

look at model_from_truth

w

w = model_from_truth$w
hist(length(w) * w/sum(w), main = "w scaled to have mean 1")

deviation matrix L, F

Not surprisingly it uncovers truth very well. Below I look at factor/loading 12.
Uncover top documents well

k = 12
n_top = ncol(truth$top_doc)
l_fitted = model_from_truth$qg$qls_mean[,k]
l_truth = truth$L[,k]

## the red lines marks 1
plot(l_truth, l_fitted)
abline(v = 1, col = "red")

sort(sort(l_fitted, index.return = TRUE, decreasing = TRUE)$ix[1:n_top])
 [1] 221 222 223 224 225 226 227 228 229 230
sort(truth$top_doc[k,])
 [1] 221 222 223 224 225 226 227 228 229 230

Uncover top words (identify 17 top words out of 20)

k = 12
n_top = ncol(truth$top_words)
f_fitted = model_from_truth$qg$qfs_mean[,k]
f_truth = truth$F[,k]

## the red lines marks 1
plot(f_truth, f_fitted)
abline(v = 1, col = "red")

f_fitted_sorted = sort(f_fitted, index.return = TRUE, decreasing = TRUE)
sort(f_fitted_sorted$ix[1:n_top])
 [1]  441  442  443  445  446  447  449  450  452  453  454  455  456  457  458
[16]  459  460 1073 2060 2075
sort(truth$top_words[k,])
 [1] 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
[20] 460
## look at the values for top words (look at the last 3 wrong ones)
f_fitted[sort(f_fitted_sorted$ix[1:n_top])]
 [1]  79.045095  24.197773  73.251358  26.541303  83.586287  67.092675
 [7] 119.780659  51.007861  65.587827  69.154921  86.199267  20.213477
[13]  30.458866  46.638130 115.936507  46.273014  43.845567   1.003489
[19]   1.003440   1.003590

Prior g

They represent how many top words/top documents pretty well.

## pi = 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top doc)
Pi_L = get_prior_summary(model_from_truth$qg$gls, log10 = TRUE, return_matrix = TRUE)

## pi around 0.01 for phi = 100, and 0.99 for phi = 0.001 (truth: around 0.01 are top words)
Pi_F = get_prior_summary(model_from_truth$qg$gfs, log10 = TRUE, return_matrix = TRUE)

look at model_from_pmf

w

Different from truth (all 1)

model_from_pmf$w
 [1]  7.344054  7.351531  7.792479  9.372576  7.904134  8.211023  6.368980
 [8]  5.493236  5.939650  6.930396  6.943949  8.330629  8.334705  5.426788
[15]  7.269157  9.001968  6.606669 10.968828  7.538840  6.265577  7.287651
[22]  7.152494  8.178605  7.717499  8.361270  5.740166  5.704718  6.840604
[29]  6.543005  7.457368  6.833842  6.263324  6.631474  6.296690  8.866826
[36]  8.296635  8.935333  8.191624  5.428021  6.452632  8.921568  7.355455
[43]  6.667635  7.229011  7.034179  6.786918  8.854385  7.253310  7.719861
[50]  7.084997
w = model_from_pmf$w
w = (w/sum(w)) * length(w)
hist(w, main = "w (scaled to have mean 1)")

l0 (same situation for f0).

The initialization for l0 is good (I also checked that after the first iteration l0 is still good), but the final output is worse. So the algorithm seems to move to a bad optimal gradually.

par(mfrow = c(2,2))

plot(truth$l0, main = "l0 from truth")

## fitted l0 (Note the minimum is 1e-8)
plot(model_from_pmf$l0, main = "l0 from model_from_pmfs")

## that's what is used for initialization ((Note the minimum is 1e-8))
plot(init$ebpmf_wbg$l0, main = "l0 in initialization")

## a natural guess for l0, f0 is rank-1 fit, which is proportional to row&col mean of X
plot(rowMeans(X), main = "rowMeans(X)")

Deviation L.

The majority of L are small numbers instead of 1. Each topic seems to capture around 20-30 top words, although obviously there is no correspondence between the fitted topics and the truth.

par(mfrow = c(2,2))
ks = c(1, 12, 23, 42)
for(k in ks){
  l_fitted = model_from_pmf$qg$qls_mean[,k]
  plot(l_fitted, ylab = sprintf("%dth loading", k))
  print(sprintf("median of %dth loading %s", k, median(l_fitted)))
}
[1] "median of 1th loading 0.137447098615451"
[1] "median of 12th loading 0.245723105965793"
[1] "median of 23th loading 0.12774018431737"

[1] "median of 42th loading 0.199135453189951"

See how much L changes from initialization.

par(mfrow = c(2,2))
ks = c(1, 12, 23, 42)
for(k in ks){
  plot(model_from_pmf_iter1$qg$qls_mean[,k], model_from_pmf$qg$qls_mean[,k], 
       log = "xy", xlab = "iter1", ylab = "iterN")
}

The top words change little from the first initialization.

Prior g.

That explains why the majority of L are very small. The huge phi has majority of weights, which favors smaller and bigger numbers than small phi (which favors 1).

##  pi = 0.8 for phi = 1e+8, pi = 0.13 for phi = 0.001 (truth: around 0.01 are top doc)
Pi_L = get_prior_summary(model_from_pmf$qg$gls, log10 = TRUE, return_matrix = TRUE)

##  pi = 0.6 for phi = 1e+8, pi = 0.13 ~ 0.2 for phi = 0.001(truth: around 0.01 are top words)
Pi_F = get_prior_summary(model_from_pmf$qg$gfs, log10 = TRUE, return_matrix = TRUE)

what is model_from_pmf fitting

  • I choose a block in lam-deviation and compare truth to the two models. (the scale of the heatmap is not clear so I show a histogram beisde it).
  • The value of blocks in truth has 3 clusters: top-word/doc times top-word/doc ; top-word/doc times others; others times others.
  • model_from_truth identifies most structure (probably makes a few mistakes, like when it misses 3 top wirds above)
  • model_from_pmf does not seem to get the structure.
block_row = 1:100
block_col = 1:200

lam_devia_from_pmf = model_from_pmf$qg$qls_mean %*% t(model_from_pmf$qg$qfs_mean) 
lam_devia_from_truth = model_from_truth$qg$qls_mean %*% t(model_from_truth$qg$qfs_mean) 
lam_devia_truth = truth$L %*% t(truth$F)

block_k_from_pmf = lam_devia_from_pmf[block_row,block_col] 
block_k_from_truth = lam_devia_from_truth[block_row,block_col] 
block_k_truth = lam_devia_truth[block_row,block_col] 

## I scaled the block for comparison
block_k_truth = block_k_truth/sum(block_k_truth)
block_k_from_truth = block_k_from_truth/sum(block_k_from_truth)
block_k_from_pmf = block_k_from_pmf/sum(block_k_from_pmf)

par(mfrow = c(4,2))
image(block_k_truth)
hist(log(block_k_truth))

image(block_k_from_truth)
hist(log(block_k_from_truth))

image(block_k_from_pmf)
hist(log(block_k_from_pmf))

plot(block_k_truth, block_k_from_truth, log = "xy")
plot(block_k_truth, block_k_from_pmf, log = "xy")


sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.15.7

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pheatmap_1.0.12 Matrix_1.2-17   ggplot2_3.3.0   workflowr_1.6.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         RColorBrewer_1.1-2 compiler_3.5.1     pillar_1.4.4      
 [5] later_1.1.0.1      git2r_0.26.1       tools_3.5.1        digest_0.6.25     
 [9] lattice_0.20-38    evaluate_0.14      lifecycle_0.2.0    tibble_3.0.1      
[13] gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.6        yaml_2.2.0        
[17] xfun_0.8           withr_2.2.0        stringr_1.4.0      dplyr_0.8.1       
[21] knitr_1.28         fs_1.3.1           vctrs_0.3.0        rprojroot_1.3-2   
[25] grid_3.5.1         tidyselect_0.2.5   glue_1.4.1         R6_2.4.1          
[29] rmarkdown_2.1      farver_2.0.3       purrr_0.3.4        magrittr_1.5      
[33] whisker_0.3-2      backports_1.1.7    scales_1.1.1       promises_1.1.1    
[37] htmltools_0.5.0    ellipsis_0.3.1     assertthat_0.2.1   colorspace_1.4-1  
[41] httpuv_1.5.4       labeling_0.3       stringi_1.4.3      munsell_0.5.0     
[45] crayon_1.3.4