If you are facing difficulties with large data sets in R, using
data.table could provide a performance boost. However, when loading
data.table, especially on macOS, you might encounter a warning indicating the absence of OpenMP support, causing
data.table to operate in a single-threaded mode. This limitation prevents you from fully utilizing the potential benefits of using
data.table and taking advantage of the underlying hardware.
1library(data.table) 2data.table 1.14.0 using 1 threads (see ?getDTthreads). Latest news: r-datatable.com 3********** 4This installation of data.table has not detected OpenMP support. 5It should still work but in single-threaded mode. 6If this is a Mac, please ensure you are using R>=3.4.0 and have followed our Mac instructions here: 7https://github.com/Rdatatable/data.table/wiki/Installation. 8This warning message should not occur on Windows or Linux. If it does, please file a GitHub issue. 9********** 10 11getDTthreads() 12 1
What is the issue here?
OpenMP is an implementation of multithreading, and the Clang compiler that comes with Xcode on macOS lacks support for OpenMP. Apple has chosen not to include the
libomp.dylib run-time library in their compiler. You can verify this by executing the following command.
1$ clang -c omp.c -fopenmp 2clang: error: unsupported option '-fopenmp'
To restore support for OpenMP in clang, one way is to 1) install the latest official LLVM release and 2) instruct R to compile
data.table with OpenMP support using a
Makevars file. Other R packages which support OpenMP will also benefit from this upgrade.
How to proceed?
In macOS, It is highly recommended to use HomeBrew package manager to ensure both automation and reproducibility. First, we need to make sure that we have the latest version of Xcode installed.
Then we proceed to install Homebrew using the following command (for more details check the HomeBrew website).
1/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Now, it’s time to install the latest version of the LLVM compiler.
1brew update && brew install llvm
The LLVM compiler installation is keg-only, which means it will shadow the system’s compiler instead of overriding it, according to HomeBrew’s terminology. In other words, it will not symlink its binaries into
/usr/local/. To instruct R to build new packages using the newly installed compiler, you need to create a
Makevars file and put it in
Let’s edit the
Makevars file to include the paths of the new compiler and the required compiler options to support OpenMP. For example, on my machine using macOS Big Sur Version 11.2.3 (at the time of writing this post), my
Makevars file looks as followed.
1# -------- 2# Makevars 3# -------- 4 5# General note 6 7# Homebrew bin / opt / lib locations 8 9### uncomment on Apple Silicon 10#HB=/opt/homebrew/bin 11#HO=/opt/homebrew/opt 12#HL=/opt/homebrew/lib 13#HI=/opt/homebrew/include 14 15### uncomment on Intel 16#HB=/usr/local/bin 17#HO=/usr/local/opt 18#HL=/usr/local/lib 19#HI=/usr/local/include 20 21# MacOS Xcode header location 22# (do "xcrun -show-sdk-path" in terminal to get path) 23XH=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk 24 25# ccache 26CCACHE=$(HB)/ccache 27 28# Make using all cores (set # to # of cores on your machine) 29MAKE=make -j4 30 31# GNU version 32GNU_VER=12 33 34# LLVM (Clang) compiler options 35CC=$(CCACHE) $(HO)/llvm/bin/clang 36CXX=$(CC)++ 37CXX98=$(CC)++ 38CXX11=$(CC)++ 39CXX14=$(CC)++ 40CXX17=$(CC)++ 41 42# FORTRAN 43FC=$(CCACHE) $(HB)/gfortran-$(GNU_VER) 44F77=$(FC) 45FLIBS=-L$(HL)/gcc/$(GNU_VER) -lgfortran -lquadmath -lm 46 47# STD libraries 48CXX1XSTD=-std=c++0x 49CXX11STD=-std=c++11 50CXX14STD=-std=c++14 51CXX17STD=-std=c++17 52 53# FLAGS 54STD_FLAGS=-g -O3 -Wall -pedantic -mtune=native -pipe 55CFLAGS=$(STD_FLAGS) 56CXXFLAGS=$(STD_FLAGS) 57CXX98FLAGS=$(STD_FLAGS) 58CXX11FLAGS=$(STD_FLAGS) 59CXX14FLAGS=$(STD_FLAGS) 60CXX17FLAGS=$(STD_FLAGS) 61 62# Preprocessor FLAGS 63# NB: -isysroot refigures the include path to the Xcode SDK we set above 64CPPFLAGS=-isysroot $(XH) -I$(HI) \ 65 -I$(HO)/llvm/include -I$(HO)/openssl/include \ 66 -I$(HO)/gettext/include -I$(HO)/tcl-tk/include 67 68# Linker flags (suggested by homebrew) 69LDFLAGS+=-L$(HO)/llvm/lib -Wl,-rpath,$(HO)/llvm/lib 70 71# Flags for OpenMP support that should allow packages that want to use 72# OpenMP to do so (data.table), and other packages that bork with 73# -fopenmp flag (stringi) to be left alone 74SHLIB_OPENMP_CFLAGS=-fopenmp 75SHLIB_OPENMP_CXXFLAGS=-fopenmp 76SHLIB_OPENMP_CXX98FLAGS=-fopenmp 77SHLIB_OPENMP_CXX11FLAGS=-fopenmp 78SHLIB_OPENMP_CXX14FLAGS=-fopenmp 79SHLIB_OPENMP_CXX17FLAGS=-fopenmp 80SHLIB_OPENMP_FCFLAGS=-fopenmp 81SHLIB_OPENMP_FFLAGS=-fopenmp
There are three things we need to pay attention to in the file above (highlighted lines):
- Depending on the architecture of the macOS machine, Intel vs Apple Silicon (M1/2), we need to uncomment the corresponding lines.
- Run xcrun -show-sdk-path to make sure they are located at /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
- To find the installed version of GCC, change directory into /usr/local/lib/gcc/ on an Intel macOS, or /opt/homebrew/lib/ on an Apple Silicon.
Now, with the above
Makevars in place. Let’s start a new R session and compile
data.table from source.
1remove.packages("data.table") 2install.packages("data.table", type = "source", 3 repos = "https://Rdatatable.gitlab.io/data.table")
To check if we have successfully compiled
data.table with OpenMP support, let’s load the library.
1library(data.table) 2# data.table 1.14.0 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
The message above shows that our
data.table is working in multithreaded mode and is currently using 4 threads. To change the number of threads used by
data.table and to make it persistent for every session, edit the
~/.Rprofile and include the following command.
1# data.table configuration 2data.table::setDTthreads(4)
Makevars configuration was adapted from the following gist.