Create a workflowr project

Last updated: 2020-06-29

Checks: 7 0

Knit directory: workflowr-useR2020/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2.9000). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200611)

The command set.seed(20200611) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 5cef3eb

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 5cef3eb. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/workflowr.Rmd) and HTML (docs/workflowr.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	622788b	John Blischak	2020-06-29	Fix typo.
Rmd	c3d41b6	John Blischak	2020-06-25	Finish the section on deploying the website.
Rmd	c2c51f4	John Blischak	2020-06-24	Continue working on section to create workflowr project.
Rmd	59a4726	John Blischak	2020-06-17	Start outline of creating workflowr website

Now that you have the Spotify analysis running, you will create a workflowr project to share the results and make sure it stays reproducible. This is a modified version of the official workflowr vignette Getting started with workflowr.

Start by loading the workflowr package in the R console.

library(workflowr)

Workflowr uses Git to version all the changes to the code and results. For each version it snapshots, Git requires a user name and email for attribution. Thus before creating the workflowr project, tell Git your name and email address you would like to use. It is convenient if the email address is the same one you used to register with GitHub, but not required. This command only has to be run once per computer (i.e. not every time you create a new workflowr project).

# Replace the example text with your information
wflow_git_config(user.name = "Your Name", user.email = "email@domain")

You can run wflow_git_config() again with no arguments to confirm that Git is configured properly.

wflow_git_config()

Now that Git is configured, you can start your workflowr project using wflow_start(). It is more common to start a workflowr proejct in a brand new directory, but the workflowr setup can also be added to an existing analysis. To limit the tutorial to one RStudio Cloud project, you will create the workflowr project in the same directory where you’ve already been working on the Spotify analysis.

The first argument to wflow_start() is the directory. The command below uses the relative path ., which refers to the current working directory. In this case it is equivalent to using the absolute path /cloud/project/. The second argument is the name of the project, which will get displayed on the website. And the third argument informs wflow_start() that the directory already exists, since by default it expects to create a new one.

wflow_start(directory = ".", name = "Spotify song analysis", existing = TRUE)

That adds various files and directories. For the purpose of the tutorial, focus on the following subset:

├── _workflowr.yml    # workflowr-specific settings
├── analysis/         # Rmd files
│   ├── index.Rmd     # Creates website homepage
│   └── _site.yml     # website-specific settings
├── data/             # data files
├── docs/             # website files

The most important thing to remember is to perform your analysis in Rmd files in analysis/, and that the website files are saved in docs/.

To see the default state of the website, run wflow_build(). It will build all the Rmd files currently in analysis/ and save the HTML files in docs/. The website will either pop up in a new window or be displayed in the Viewer pane.

wflow_build()

Conveniently, if you run it a second time, it does nothing because the Rmd files have not changed since the last time the HTML files were built.

wflow_build()

Next add the Spotify analysis files to the workflowr project. You can do this via the files Pane or the running the commands below in the R console. The Rmd file goes to analysis/ and the data file to data/.

file.rename("spotify.Rmd", "analysis/spotify.Rmd")
file.rename("spotify.csv", "data/spotify.csv")

And since spotify.csv is no longer in the same directory as spotify.Rmd, you need to update the path passed to read.csv(). By default, all Rmd files in a workflowr project are executed in the root of the project, so the updated path is data/spotify.csv. Open analysis/spotify.Rmd and change the import line to the line below:

spotify <- read.csv("data/spotify.csv", stringsAsFactors = FALSE)

Run wflow_build() again. This will build analysis/spotify.Rmd and then open the new file docs/spotify.html.

wflow_build()

Check the accuracy of the decision tree model and the random guessing model. Did you obtain the same results as everyone?

This is because workflowr automatically sets the seed at the beginning of each Rmd file. The default seed used for a project is the date in the format YYYYMMDD. Open the file _workflowr.yml to confirm.

Also, you can look at the reproducibility report at the top of the file to see the seed that is set and all the other reproducibility checks. The first check is failing because the Rmd file hasn’t been versioned with Git yet.

So far you’ve run wflow_build() multiple times. This builds the HTML files, but it doesn’t version anything with Git. Run wflow_status() to see the current status of the Rmd files. The default Rmd files are “Unpublished” because they have been versioned with Git, but not their HTML files. The new spotify.Rmd is “Scratch” because its Rmd has not been versioned yet.

wflow_status()

There currently isn’t a convenient way to navigate to the Spotify analysis on the website. To fix this, open analysis/index.Rmd, add the line below, and run wflow_build() again.

[Link to spotify analysis](spotify.html)

The line above is markdown syntax to create a link to the file spotify.html.

Now that the website is looking good, it is time to “publish” your results.

wflow_publish(c("analysis/*Rmd", "data/spotify.csv"), "Add spotify analysis")

wflow_publish() performs three steps:

Commits the Rmd files and data file with Git
Executes the code to build the HTML files
Commits the HTML files with Git

Re-running wflow_status() shows that all the files are “published”:

wflow_status()

Now that the files are committed in the Git repository, it’s time to push them to GitHub to share the analysis results.

Go to https://github.com/new (you’ll be prompted to login first if needed) to create a new repository. In the box labeled “Repository name”, type “workflowr-spotify”. Leave all the other settings as is, and click on the button “Create repository”.

Back in the R console, you need to inform Git of the GitHub repository you just created. In Git terminology, a Git repository on a cloud service like GitHub is called a “remote”. Workflowr provides a convenience function wflow_git_remote() to register remote repositores with Git. Run the command below using your GitHub username:

wflow_git_remote("origin", user = "<github-username>", repo = "workflowr-spotify")

Run wflow_git_remote() a second time, this time with no arguments, to have it list the available remote repositories. The URL will look something like https://github.com/<github-username>/workflowr-spotify.git.

wflow_git_remote()

Note that “origin” is an alias for referring to that long URL. The name “origin” is a convention, and could be anything you wanted. The main benefit of following the convention is that it will make it easier to follow online troubleshooting resources.

Now that Git knows about the remote GitHub repository, push the project to GitHub using wflow_git_push(). You’ll be prompted to enter your GitHub username in the R console followed by a secure popup menu for entering your GitHub password.

wflow_git_push()

The updated GitHub repository will automatically open in a new tab. To activate the website, navigate to the Settings tab and scroll down to the section “GitHub Pages”. Set the “Source” to “master branch /docs folder”. After it automatically updates, scroll back down to the same section to retreive the URL. It will look like https://<github-username>.github.io/workflowr-spotify/. Click on it to view your workflowr website. If it displays a “404 Not Found” error, manually add index.html to the URL in the web browser (Long-term you don’t have to do this. It only applies when GitHub Pages is first launching your website).

Now you have a website that containing your reproducible results that you can share with your colleagues! And each time you make changes and push them to GitHub, the website will automatically update.

sessionInfo()

Create a workflowr project

John Blischak

2020-06-17