Writing an R package interface
to C++ libraries
with Rcpp

Watal M. Iwasaki   @heavywatal
SOKENDAI, The Graduate University for Advanced Studies
2018-07-15 Tokyo.R #71

@heavywatal

Watal M. Iwasaki = 岩嵜 航
https://heavywatal.github.io/
PhD in Life Sciences, Tohoku University, Sendai
Evolutionary theory of complexity and diversity in biological systems.
Postdoc in SOKENDAI, Hayama
Evolution of diversity within a tumor/cancer.
Likes : 🍺 Beer, Sake, Whisky, Cooking : ♬ Heavy Metal, Classical, Folk

tumopp — tumor growth simulator in C++


Iwasaki and Innan (2017)

tumopp — tumor growth simulator in C++

Available via Homebrew/Linuxbrew:

brew install heavywatal/tap/tumopp
tumopp -N10000 -D3 -Chex -k100 -d0.1 -m0.5 -o OUTPUT_DIR

Dependencies:

https://github.com/heavywatal/tumopp

tumopp — tumor growth simulator in C++

Library structure:

CMAKE_INSTALL_PREFIX/
├── bin/
│   └── tumopp
├── include/
│   └── tumopp/
│       ├── cell.hpp
│       ├── simulation.hpp
│       └── tissue.hpp
└── lib/
    └── libtumopp.dylib    # libtumopp.so on Linux

Output structure:

OUTPUT_DIR/
├── drivers.tsv.gz
├── population.tsv.gz
├── program_options.conf
└── snapshots.tsv.gz

Workflow

  1. Run tumopp with some parameter sets from the command line.
  2. Write results to TSV files
  3. Start R
  4. Read TSV files as data.frames
  5. Visualize and analyze with tidyverse packages

Not too bad.
But it will be more convenient if I can run tumopp in R:

library(tumopp)
results = tumopp(some_parameter_sets) %>% print()
     max  coord dimensions shape delta0  rho0 population  drivers
   <int> <char>      <int> <int>  <num> <num>     <list>   <list>
1: 10000    hex          3   100    0.1   0.5   <tbl_df> <tbl_df>

Seamless R and C++ integration with Rcpp

It is typically used to eliminate bottlenecks in R code.
Many online examples show the ways to define a short function:

library(Rcpp)

Rcpp::cppFunction('
int fibonacci(int x) {
  if (x < 2) return x;
  return fibonacci(x - 1) + fibonacci(x - 2);
}
')

fibonacci(8L)
[1] 21

But I already have C++ functions in my library.
How can I use them in R?

Create an R+Rcpp package from scratch

Package components (See http://r-pkgs.had.co.nz/ for details):

DESCRIPTION  # Package metadata
LICENSE
NAMESPACE    # List of objects to import/export
R/           # R code (*.R)
man/         # Object documentation (*.Rd)
src/         # C++ source code (*.cpp)
vignettes/   # Long-form guide

Use devtools and/or usethis to setup a skeleton:

usethis::create_package("tumopp")
usethis::use_mit_license()
usethis::use_roxygen_md()
usethis::use_package_doc()
usethis::use_rcpp()
usethis::use_git()

Then, modify DESCRIPTION, R/, and src/.

Package-wide settings in R/tumopp-package.R

#' @useDynLib tumopp, .registration = TRUE
#' @importFrom Rcpp sourceCpp
#' @importFrom magrittr %>%
#' @aliases NULL tumopp-package
#' @keywords internal
"_PACKAGE"

.onUnload = function(libpath) {
  library.dynam.unload("tumopp", libpath)
}
  • @useDynLib is needed to import compiled C++ functions.
  • @importFrom Rcpp sourceCpp seems necessary to load Rcpp.
  • "_PACKAGE" is a special string to generate package documentation.
  • .onUnload is recommended in http://adv-r.had.co.nz/Rcpp.html,
    but not used in the major packages these days…?
    Hadley said: “It is polite to define it, but easy to forget.”

Define Rcpp function to use external libraries

Create src/run.cpp:

// [[Rcpp::plugins(cpp14)]]
#include <Rcpp.h>
#include <tumopp/simulation.hpp>

//' Run C++ simulation
//' @param args command line arguments as a string vector
// [[Rcpp::export]]
Rcpp::CharacterVector
cpp_tumopp(const std::vector<std::string>& args) {
    tumopp::Simulation simulation(args);
    simulation.run();
    return Rcpp::CharacterVector::create(
        Rcpp::Named("config", simulation.config_string()),
        Rcpp::Named("specimens", simulation.specimens()),
        Rcpp::Named("drivers", simulation.drivers())
    );
}

Try devtools::check().

Configure compile options

Error: R does not know where my C++ library is located:

run.cpp:3:10: fatal error: 'tumopp/simulation.hpp' file not found
#include <tumopp/simulation.hpp>
         ^~~~~~~~~~~~~~~~~~~~~~~

Write compile options to src/Makevars directly:

CXX_STD=CXX14
PKG_CPPFLAGS=-DSTRICT_R_HEADERS -I/usr/local/include
PKG_LIBS=-L/usr/local/lib -Wl,-rpath,/usr/local/lib -ltumopp

Or use configure/CMake script to generate it from src/Makevars.in:

CXX_STD=CXX14
PKG_CPPFLAGS=-DSTRICT_R_HEADERS @CPPFLAGS@
PKG_LIBS=@LDFLAGS@ @LDLIBS@
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Configure-and-cleanup

Transform C++ strings to data.frame

Create R/tumopp:

#' `tumopp()` returns full results with config columns in a data.frame
#' @param args command line arguments as a string vector
#' @export
tumopp = function(args = character(0L)) {
  result = cpp_tumopp(c(0L, 0L, args))
  population = readr::read_tsv(result["specimens"])
  drivers = readr::read_tsv(result["drivers"])
  readr::read_tsv(result["config"]) %>%
    dplyr::mutate(population = list(population)) %>%
    dplyr::mutate(drivers = list(drivers))
}
     max  coord dimensions shape delta0  rho0 population  drivers
   <int> <char>      <int> <int>  <num> <num>     <list>   <list>
1: 10000    hex          3   100    0.1   0.5   <tbl_df> <tbl_df>

There must be some more efficient ways… feather? arrow?

Workflow improvement

Before using Rcpp:

  1. Run tumopp with some parameter sets
  2. Write results to TSV files
  3. Start R
  4. Read TSV files as data.frames
  5. Visualize and analyze

Thanks to Rcpp:

  1. Start R
  2. Run tumopp with some parameter sets in R
  3. Get results in a nested data.frame
  4. Visualize and analyze

Problem: Modern C++11/14/17 supported?

http://gallery.rcpp.org/articles/rcpp-and-c++11-c++14-c++17/

🚧 Rtools 4 with gcc 8 is under development. (Thanks, Yutani-san!)

Tasks and Questions

✅ Core C++ library for tumor growth simulation
✅ R interface package using Rcpp and devtools
🚧 Visualization and analysis using tidyverse packages
🚧 Documentation using roxygen2, rmarkdown, and pkgdown
🚧 Tests using testthat
⬜ Hexagonal logo
✅ Publication doi:10.1371/journal.pone.0184229
✅ Advertisement in conferences (GSJ2017, SMBE2018, Tokyo.R#71)
⬜ Better way to transfer C++ data to R data.frames: feather? arrow?
⬜ C++ and R on Windows: WSL? Cygwin? MSYS? Docker?
✅ Talk over beers with YOU! 🍻

Reference