Recently, we were doing an exercise in our group which requires setting up R on macOS. Here, I share how I set up R on my macOS machines (Intel and Apple Silicon). The tutorial will be divided into:

  1. Installing system dependencies required for R libraries using homebrew.
  2. Installing R libarires using renv.
  3. Saving and restoring R environments using renv.
  4. Installing R libraries hosted on private repositories.
  5. Building R libraries from source using Makevars.

Installing Homebrew

# install xcode CLI
xcode-select --install
# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Installing system dependencies

Create a Brewfile for batch installation of R, RStudio and the required system libraries

brew "ccache"     # Object-file caching compiler wrapper
brew "coreutils"  # GNU File, Shell, and Text utilities
brew "gdal"       # Geospatial Data Abstraction Library
brew "geos"       # Geometry Engine
brew "gfortran"   # GNU compiler collection
brew "graphviz"   # Graph visualization software from AT&T and Bell Labs
brew "libgit2"    # C library of Git core methods that is re-entrant and linkable
brew "libjpeg"    # Image manipulation library
brew "libomp"     # LLVM's OpenMP runtime library
brew "libxml2"    # GNOME XML library
brew "llvm"       # Next-gen compiler infrastructure
brew "openblas"   # Optimized BLAS library
brew "openssl"    # OpenSSL GIO module for glib
brew "poppler"    # PDF rendering library (based on the xpdf-3.0 code base)
brew "proj"       # Cartographic Projections Library
brew "readline"   # Library for command-line editing
brew "unixodbc"   # ODBC 3 connectivity for UNIX
brew "xz"         # General-purpose data compression with high compression ratio
brew "zlib"       # General-purpose lossless data-compression library
brew "r"
cask "rstudio"

Then run the installation using the following brew command,

brew bundle install --force --file Brewfile -v

Installing R libraries

To set up a reproducible R environment, I use renv. In the following example, I am creating a new environment to install the signature.tools.lib by doing the following steps:

  1. Set the current project folder as the current working directory
  2. Install the renv library
  3. Initialise and activate the renv environment
setwd("~/R_Projects/signature_tools_tutorial")

install.packages('renv')
renv::init()
renv::activate()

This way, only the renv library will be installed in our system-wide location, e.g., /usr/local/Cellar/r/ (Intel macOS) or /opt/homebrew/Cellar/r/ (Apple Silicon).

Any project-specific libraries will be installed into the renv base located at ~/Library/Caches/org.R-project.R. When creating a new environment, renv links the libraries from its base into the renv folder in the project folder. This keeps the system base clean and shares the renv space across different R environments.

To install project-specific R libraries, use the renv::install() function. For example, signature.tools.lib has the following dependencies:

renv::install("bioc::VariantAnnotation")
renv::install("bioc::BSgenome.Hsapiens.UCSC.hg38")
renv::install("bioc::BSgenome.Hsapiens.1000genomes.hs37d5")
renv::install("bioc::BSgenome.Mmusculus.UCSC.mm10")
renv::install("bioc::BSgenome.Cfamiliaris.UCSC.canFam3")
renv::install("bioc::SummarizedExperiment")
renv::install("bioc::BiocGenerics")
renv::install("bioc::GenomeInfoDb")
renv::install("NMF")
renv::install("foreach")
renv::install("doParallel")
renv::install("lpSolve")
renv::install("ggplot2")
renv::install("cluster")
renv::install("methods")
renv::install("stats")
renv::install("linxihui/NNLM")
renv::install("nnls")
renv::install("GenSA")
renv::install("gmp")
renv::install("plyr")
renv::install("RCircos")
renv::install("scales")
renv::install("bioc::GenomicRanges")
renv::install("bioc::IRanges")
renv::install("bioc::BSgenome")
renv::install("readr")
renv::install("doRNG")
renv::install("combinat")
renv::install("Nik-Zainal-Group/signature.tools.lib.dev")
renv::install("Nik-Zainal-Group/teachingmutationalsignatures")

Saving and restoring R environments

To instruct renv to save the above environment, we need the renv::snapshot() function, but renv doesn’t save any library that is not called somewhere in our code (!). Therefore, I created the libraries.R file to call all the required libraries:

library("VariantAnnotation")
library("BSgenome.Hsapiens.UCSC.hg38")
library("BSgenome.Hsapiens.1000genomes.hs37d5")
library("BSgenome.Mmusculus.UCSC.mm10")
library("BSgenome.Cfamiliaris.UCSC.canFam3")
library("SummarizedExperiment")
library("BiocGenerics")
library("GenomeInfoDb")
library("NMF")
library("foreach")
library("doParallel")
library("lpSolve")
library("ggplot2")
library("cluster")
library("methods")
library("stats")
library("NNLM")
library("nnls")
library("GenSA")
library("gmp")
library("plyr")
library("RCircos")
library("scales")
library("GenomicRanges")
library("IRanges")
library("BSgenome")
library("readr")
library("doRNG")
library("combinat")
library("signature.tools.lib")
library("teachingmutationalsignatures")

Saving the environment will generate a renv.lock.

# save renv.lock file
renv::snapshot()

Now, to restore the environment on a new location or a new machine, I copy the two files libraries.R and renv.lock into a new project folder, then run the following commands:

# if `renv` is not installed on the new system
install.packages('renv')

renv::init()
# or
renv::restore()

Installation from private repositories

For R libraries hosted on a private repository, we additionally need to authenticate with the hosting service. One way is to generate a personal access token (PAT). For example, on Github, go to Developer Settings, then Personal Access Tokens and select the repo scope. The token will be displayed only once, so copy it somewhere safe and treat it like your password.

To authenticate using the generated PAT.

Sys.setenv(GITHUB_PAT = "PASTE_TOKEN_HERE")
renv::install("private_repo/R_package")

Installation from source

To install R libraries from source, we need to instruct R where to find the required system dependencies. This is done by using a Makevars file. which is placed ~/.R/Makevars and looks like this.

# --------
# Makevars
# --------

# General note

# Homebrew bin / opt / lib locations

### uncomment on Apple Silicon
#HB=/opt/homebrew/bin
#HO=/opt/homebrew/opt
#HL=/opt/homebrew/lib
#HI=/opt/homebrew/include

### uncomment on Intel
#HB=/usr/local/bin
#HO=/usr/local/opt
#HL=/usr/local/lib
#HI=/usr/local/include

# MacOS Xcode header location
# (do "xcrun -show-sdk-path" in terminal to get path)
XH=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk

# ccache
CCACHE=$(HB)/ccache

# Make using all cores (set # to # of cores on your machine)
MAKE=make -j4

# GNU version
GNU_VER=12

# LLVM (Clang) compiler options
CC=$(CCACHE) $(HO)/llvm/bin/clang
CXX=$(CC)++
CXX98=$(CC)++
CXX11=$(CC)++
CXX14=$(CC)++
CXX17=$(CC)++

# FORTRAN
FC=$(CCACHE) $(HB)/gfortran-$(GNU_VER)
F77=$(FC)
FLIBS=-L$(HL)/gcc/$(GNU_VER) -lgfortran -lquadmath -lm

# STD libraries
CXX1XSTD=-std=c++0x
CXX11STD=-std=c++11
CXX14STD=-std=c++14
CXX17STD=-std=c++17

# FLAGS
STD_FLAGS=-g -O3 -Wall -pedantic -mtune=native -pipe
CFLAGS=$(STD_FLAGS)
CXXFLAGS=$(STD_FLAGS)
CXX98FLAGS=$(STD_FLAGS)
CXX11FLAGS=$(STD_FLAGS)
CXX14FLAGS=$(STD_FLAGS)
CXX17FLAGS=$(STD_FLAGS)

# Preprocessor FLAGS
# NB: -isysroot refigures the include path to the Xcode SDK we set above
CPPFLAGS=-isysroot $(XH) -I$(HI) \
    -I$(HO)/llvm/include -I$(HO)/openssl/include \
    -I$(HO)/gettext/include -I$(HO)/tcl-tk/include

# Linker flags (suggested by homebrew)
LDFLAGS+=-L$(HO)/llvm/lib -Wl,-rpath,$(HO)/llvm/lib

# Flags for OpenMP support that should allow packages that want to use
# OpenMP to do so (data.table), and other packages that bork with
# -fopenmp flag (stringi) to be left alone
SHLIB_OPENMP_CFLAGS=-fopenmp
SHLIB_OPENMP_CXXFLAGS=-fopenmp
SHLIB_OPENMP_CXX98FLAGS=-fopenmp
SHLIB_OPENMP_CXX11FLAGS=-fopenmp
SHLIB_OPENMP_CXX14FLAGS=-fopenmp
SHLIB_OPENMP_CXX17FLAGS=-fopenmp
SHLIB_OPENMP_FCFLAGS=-fopenmp
SHLIB_OPENMP_FFLAGS=-fopenmp

There are three things we need to pay attention to in the file above (highlighted lines):

  1. Depending on the architecture of the macOS machine, Intel vs Apple Silicon (M1/2), we need to uncomment the corresponding lines.
  2. Run xcrun -show-sdk-path to make sure they are located at /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
  3. To find the installed version of GCC, change directory into /usr/local/lib/gcc/ on an Intel macOS, or /opt/homebrew/lib/ on an Apple Silicon.

I have used the above method to install the data.table library with openMP to make use of the multiprocessing support. You can read about it here here