Last updated: 2020-02-25

Checks: 7 0

Knit directory: RcppComputingClub/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200224) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


working directory clean

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd e98074c scristia 2020-02-25 working on index
Rmd 33eeea2 scristia 2020-02-24 wflow_git_commit(all = TRUE, files = “intro.Rmd”)

#include <RcppArmadillo.h>
// Note #include <Rcpp.h> implied by RcppArmadillo.
using namespace Rcpp;

What is Rcpp

  • Rcpp: Seamless integration between R and C++
  • Extremely simple to connect C++ with R . Rcpp Library in C++ enables R -like syntax and operations on imported objects from R .
  • Maintained by Dirk Eddelbuetter and Romain Francois
  • Well supported by Rstudio.
  • Supported by knitr as well ( rcpp in code chunk header).
    • See source of this Rmd for example how to cache all Rcpp source chunks into a single compilation unit.
  • Supports any C++ language standard the underlying compiler supports: C++98, C++11, C++14, C++17
    • Packages using Rcpp can deploy standards suppported by R: C++, C++11 and very soon C++14

Simple examples

Let’s begin by examining to very simple examples of calling C++ functions within R :

// [[Rcpp::export]]
int square(int x) {
    return x*x;
}

// [[Rcpp::export]]
int add(int x, int y, int z) {
    int sum = x + y + z;
    return sum;
}

Note the // [[Rcpp::export]] line allows the function to be exported into R . Within R , I can now call:

square(7L)
[1] 49
add(1, 2, 3)
[1] 6

It is very straightforward to call the C++ implemented function within R. The motivation behind Rcpp comes when you run into a computation in R that create a performance bottleneck in your code. By instead writing your function in C++, you can achieve a massive speed boost. Rcpp seeks to make the interaction between R and C++ as seamless and convenient as possible.

Using cppFunction

It can be even easier to parse C++ code in R

Rcpp::cppFunction('double accu(NumericVector x) { return(
      std::accumulate(x.begin(), x.end(), 0.0)
   );
}')
res <- accu(seq(1, 10, by=0.5))
res
[1] 104.5

cppFunction parses C++ code and compiles it for use within R. If you have an external .cpp file you want to call within R, can use sourceCpp. You can also evaluate a single C++ statement with evalCpp()

# Showing maximum value of double.
Rcpp::evalCpp('std::numeric_limits<double>::max()')
[1] 1.797693e+308

Why C++?

  • One of the most frequently used programming languages in the world.
  • Speed.
  • Good chance what you want is already implemented in C++
  • From wikipedia: ‘C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose, powerful programming language.’

Why not C++?

  • More difficult to debug
  • More difficult to modify
  • The The population of potentials users who understand both R and C++ is smaller.

Why Rcpp?

  • Easy to use (honest).
  • Clean and approachable API that enable for high performance code.
  • R style vectorized code at C++ level.
  • Programmer time vs computer time: much more efficient code that does not take much longer to write.
  • Enables access to advanced data structures and algorithms implemented in C++ but not provided by R.
    • eg STL (C++ Standard Template Library), Boost
  • Handles garbage collection and the Rcpp programmer should never have to worry about memory allocation and deallocation.
  • If you are developing a package and you want people to use that package, you probably want it to be fast.

Quick C++ primer

Here’s a 2-minute C++ introduction:

// This is a comment
/* Also this */

// [[Rcpp::export]]
double sumC(NumericVector x) {
    int n = x.size();
    double total = 0;
    for(int i = 0; i < n; ++i) {
        total += x[i]; if(total > 100) break;
    }
    return total;
}
sumC(seq(1:10))
[1] 55

Notice the following things about C++:

  • Need to initialize your variables with data type.
  • for loops of structure for(initialization; condition; increment).
  • Conditionals are the same as R.
  • End every statement with a semicolon;
  • Vectors and arrays are 0-indexed.
  • size() is a member function on the vector class - x.size() returns the size of x.
  • We also saw previously you can call a function from a particular C++ library with :: (std::accumulate), similar to R.
  • While C++ can be a very complex language, just knowing these will enable you to write faster R functions.

Rcpp Basics

A quick introduction to Rcpp objects and operations:

Data structures

  • All R objects are internally represented by a SEXP: a pointer to an S expression object.
  • Any R object can be passed down to C++ code: vectors, matrices lists. Even functions and environments.
  • A large number of user-visible classes for R objects, which contain pointers the the SEXP object.
    • IntegerVector
    • NumericVector
    • LogicalVector
    • CharacterVector
    • NumericMatrix
    • S4
    • and many more

Rcpp Sugar

  • Rcpp sugar brings a higher level of abstraction to C++ code written in Rcpp.
  • Avoid C++ loops with code that strongly resembles R.
  • Takes advantage of operator overloading.
  • Despite the similar syntax, peformance is much faster in C++, though not quite as fast as manually optimized C++ code.

Example:

// Rcpp implementation
// [[Rcpp::export]]
NumericVector pdistCpp(double x, NumericVector ys) {
    return pow((x-ys), 2);
}
# R implementation
pdistR <- function(x, ys) {
    (x - ys)^2
}

pdistR(5.0, c(4.1, -9.3, 0, 13.7))
[1]   0.81 204.49  25.00  75.69
pdistCpp(5.0, c(4.1, -9.3, 0, 13.7))
[1]   0.81 204.49  25.00  75.69

Note the similarity of the Rcpp and R implementations. Rcpp performs R styled vectorizations.

Logical Operators

// two integer vectors of the same size
NumericVector x;
NumericVector y;
// expressions involving two vectors
LogicalVector res = x < y;
LogicalVector res = x != y;
// one vector, one single value
LogicalVector res = x < 2;
// two expressions
LogicalVector res = (x + y) == (x*x);
// functions producing single boolean result
all(x*x < 3);
any(x*x < 3);

R-like functions

There are many functions similar to what exists inside R:

is_na(x);
seq_along(x);
sapply( seq_len(10), square<int>() );
ifelse( x < y, x, (x+y)*y );
pmin( x, x*x);
diff( xx );
intersect( xx, yy); //returns interserct of two vectors
unique( xx ); // subset of unique values in input vecto

// math functions
abs(x); exp(x); log(x); ceil(x);
sqrt(x); sin(x); gamma(x);
range(x);
mean(x); sd(x); var(x);
which_min(x); which_max(x);
// A bunch more

Density and RNG functions

Rcpp has access to the same density, distribution, and RNG functions used by R itself. The seed you set in R will even be passed into Rcpp. For example, you can draw from a gamma distribution with scale and shape parameters equal to 1 with:

RNGScope scope;
// [[Rcpp::export]]
NumericVector getRGamma() {
            NumericVector x = rgamma( 10, 1, 1 );
            return x;
}
set.seed(1234)
getRGamma()
 [1] 0.01072755 0.74699906 0.78632796 0.11687523 0.92185869 0.17617898
 [7] 1.43714223 0.15671970 0.21950202 3.52757261
set.seed(1234)
rgamma(10, 1, 1)
 [1] 0.01072755 0.74699906 0.78632796 0.11687523 0.92185869 0.17617898
 [7] 1.43714223 0.15671970 0.21950202 3.52757261

Printing to console:

Don’t use stdout and stderr as you would in C++. Instead, use Rcpp::Rcout or Rcpp::print.

// [[Rcpp::export]]
void useOperatorOnVector(NumericVector x) { 
    Rcpp::Rcout << "Rcpp vector is " << std::endl << x << std::endl;
}


// [[Rcpp::export]]
void callPrint(RObject x) { 
    Rcpp::print(x);             // will work on any SEXP object
}
v <- seq(0.0, 10.0, by=2.5)
useOperatorOnVector(v)
Rcpp vector is 
0 2.5 5 7.5 10
callPrint(v)
[1]  0.0  2.5  5.0  7.5 10.0

Be careful with pointers!

// [[Rcpp::export]]
NumericVector logRcpp1(NumericVector invec) {
    NumericVector outvec(invec);
    for(int i=0; i<invec.size(); i++) {
        outvec[i] = log(invec[i]);
    }
    return outvec;
}
x <- seq(1.0, 3.0, by=1)
cbind(x, logRcpp1(x))
             x          
[1,] 0.0000000 0.0000000
[2,] 0.6931472 0.6931472
[3,] 1.0986123 1.0986123

Note: outvec and invec point to the same underlying R object.

Instead, Use clone to not modify original vector.

// [[Rcpp::export]]
NumericVector logRcpp2(NumericVector invec) {
    NumericVector outvec=clone(invec);
    for(int i=0; i<invec.size(); i++) {
        outvec[i] = log(invec[i]);
    }
    return outvec;
}
x <- seq(1.0, 3.0, by=1)
cbind(x, logRcpp2(x))
     x          
[1,] 1 0.0000000
[2,] 2 0.6931472
[3,] 3 1.0986123

RcppArmadillo

  • Armadillo is a high level and easy to use C++ linear algebra library with syntax similar to Matlab.
  • RcppArmadillo is an Rcpp interface allowing access to the Armadillo library.
  • Supports a large body of linear algebra functions.
#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
void showMatrix(arma::mat X) {
    Rcout << "Armadillo matrix is" << std::endl << X << std::endl;
}

// [[Rcpp::export]]
List svd_arma(arma::mat x) {
    arma::mat xtx = x.t() * x ;
    arma::mat U ;
    arma::vec s ;
    arma::mat V ;
    svd(U, s, V, xtx) ;
    List ret ;
    ret["U"] = U ;
    ret["s"] = s ;
    ret["V"] = V ;
    return(ret) ;
}
M <- matrix(1:9,3,3)
showMatrix(M)
Armadillo matrix is
   1.0000   4.0000   7.0000
   2.0000   5.0000   8.0000
   3.0000   6.0000   9.0000
## SVD implemented in Armadillo
svd_arma(M)
$U
           [,1]       [,2]       [,3]
[1,] -0.2148372  0.8872307  0.4082483
[2,] -0.5205874  0.2496440 -0.8164966
[3,] -0.8263375 -0.3879428  0.4082483

$s
             [,1]
[1,] 2.838586e+02
[2,] 1.141413e+00
[3,] 1.462429e-15

$V
           [,1]       [,2]       [,3]
[1,] -0.2148372  0.8872307 -0.4082483
[2,] -0.5205874  0.2496440  0.8164966
[3,] -0.8263375 -0.3879428 -0.4082483

Note above was an example of returning multiple objects to R in a list.

Do you need Rcpp?

Before using Rcpp in your package, you should first consider if it’s necessary.

Typical R bottlenecks

  • Loops that depend on previous iterations
    • MCMC methods, EM Algorithm, gradient descent…
  • Function calls in R slow, but very little overhead in C++.
    • Recursive functions are very inefficient in R.
  • Not having access to advanced data structures algorithms in R but available in C++.

When is Rcpp not the solution?

  • Sometimes the solution is to become a better R coder.
  • Take advantage of vectorization when possible.
  • Most base R functions already call C functions. Make sure there isn’t already an efficient implementation of what you are trying to do.
  • If your bottleneck is with manipulating large rectangular data (ie over 1 million rows or thousands of columns), consider data.table.

Other resources


sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nvimcom_0.9-81

loaded via a namespace (and not attached):
 [1] workflowr_1.6.0           Rcpp_1.0.3               
 [3] rprojroot_1.3-2           digest_0.6.21            
 [5] later_1.0.0               R6_2.4.0                 
 [7] backports_1.1.5           git2r_0.26.1             
 [9] magrittr_1.5              evaluate_0.14            
[11] stringi_1.4.3             rlang_0.4.0              
[13] RcppArmadillo_0.9.850.1.0 fs_1.3.1                 
[15] promises_1.1.0            whisker_0.4              
[17] rmarkdown_1.16            tools_3.5.2              
[19] stringr_1.4.0             glue_1.3.1               
[21] httpuv_1.5.2              xfun_0.10                
[23] yaml_2.2.0                compiler_3.5.2           
[25] htmltools_0.4.0           knitr_1.25