Skip to Content

Using data.table with OpenMP support

If you are struggling with large data sets in R, you might benefit from a performance boost by using data.table. However, when loading data.table, at least on macOS, you may receive a warning that no OpenMP support has been detected and that data.table is operating in a single-threaded mode. This will limit the benefits of using data.table in the first place by not taking full advantage of the underlying hardware.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
library(data.table)
data.table 1.14.0 using 1 threads (see ?getDTthreads).  Latest news: r-datatable.com
**********
This installation of data.table has not detected OpenMP support.
It should still work but in single-threaded mode.
If this is a Mac, please ensure you are using R>=3.4.0 and have followed our Mac instructions here:
https://github.com/Rdatatable/data.table/wiki/Installation.
This warning message should not occur on Windows or Linux. If it does, please file a GitHub issue.
**********

getDTthreads()
[1] 1

What is the issue here?

OpenMP is an implementation of multithreading and the clang compiler that ships with Xcode on macOS lacks the support for OpenMP. Apple decided not to include the libomp.dylib run-time library in their compiler which we can check by issuing the following command.

1
2
$ clang -c omp.c -fopenmp
clang: error: unsupported option '-fopenmp'

To restore support for OpenMP in clang, one way is to 1) install the latest official LLVM release and 2) instruct R to compile data.table with OpenMP support using a Makevars file. Other R packages which support OpenMP will also benefit from this upgrade.

How to proceed?

In macOS, It is highly recommended to use HomeBrew package manager to ensure both automation and reproducibility. First, we need to make sure that we have the latest version of Xcode installed.

1
xcode-select --install

Then, if HomeBrew is not already installed, we can install it using the following command (for more details check the HomeBrew website).

1
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Now, it’s time to install the latest version of the LLVM compiler.

1
brew update && brew install llvm

The installation of LLVM compiler is a keg-only, which in HomeBrew’s terms means that it will shadow the system’s compiler rather than overriding it. In other words, it will not symlink its binaries into /usr/local/. So, to instruct R to build new packages using the newly installed compiler, one needs to create a Makevars file and place it in ~/.R/Makevars.

1
nvim ~/.R/Makevars

Let’s edit the Makevars file to include the paths of the new compiler and the required compiler options to support OpenMP. For example, on my machine using macOS Big Sur Version 11.2.3 (at the time of writing this post), my Makevars file looks as followed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#CC=clang
#CXX=clang++

# --------
# Makevars
# --------

# General note

# Homebrew bin / opt / lib locations
HB=/usr/local/bin
HO=/usr/local/opt
HL=/usr/local/lib
HI=/usr/local/include

# MacOS Xcode header location
# (do "xcrun -show-sdk-path" in terminal to get path)
XH=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk

# ccache
CCACHE=$(HB)/ccache

# Make using all cores (set # to # of cores on your machine)
MAKE=make -j4

# GNU version
GNU_VER=11

# LLVM (Clang) compiler options
CC=$(CCACHE) $(HO)/llvm/bin/clang
CXX=$(CC)++
CXX98=$(CC)++
CXX11=$(CC)++
CXX14=$(CC)++
CXX17=$(CC)++

# FORTRAN
FC=$(CCACHE) $(HB)/gfortran-$(GNU_VER)
F77=$(FC)
FLIBS=-L$(HL)/gcc/$(GNU_VER) -lgfortran -lquadmath -lm

# STD libraries
CXX1XSTD=-std=c++0x
CXX11STD=-std=c++11
CXX14STD=-std=c++14
CXX17STD=-std=c++17

# FLAGS
STD_FLAGS=-g -O3 -Wall -pedantic -mtune=native -pipe
CFLAGS=$(STD_FLAGS)
CXXFLAGS=$(STD_FLAGS)
CXX98FLAGS=$(STD_FLAGS)
CXX11FLAGS=$(STD_FLAGS)
CXX14FLAGS=$(STD_FLAGS)
CXX17FLAGS=$(STD_FLAGS)

# Preprocessor FLAGS
# NB: -isysroot refigures the include path to the Xcode SDK we set above
CPPFLAGS=-isysroot $(XH) -I$(HI) \
	-I$(HO)/llvm/include -I$(HO)/openssl/include \
	-I$(HO)/gettext/include	-I$(HO)/tcl-tk/include

# Linker flags (suggested by homebrew)
LDFLAGS+=-L$(HO)/llvm/lib -Wl,-rpath,$(HO)/llvm/lib

# Flags for OpenMP support that should allow packages that want to use
# OpenMP to do so (data.table), and other packages that bork with
# -fopenmp flag (stringi) to be left alone
SHLIB_OPENMP_CFLAGS=-fopenmp
SHLIB_OPENMP_CXXFLAGS=-fopenmp
SHLIB_OPENMP_CXX98FLAGS=-fopenmp
SHLIB_OPENMP_CXX11FLAGS=-fopenmp
SHLIB_OPENMP_CXX14FLAGS=-fopenmp
SHLIB_OPENMP_CXX17FLAGS=-fopenmp
SHLIB_OPENMP_FCFLAGS=-fopenmp
SHLIB_OPENMP_FFLAGS=-fopenmp

Notice that you can issue the following command xcrun -show-sdk-path to get the path to the developer tools on your system. In addition, to get GNU_VER, one can check the contents of the following directory /usr/local/lib/gcc/ which in my case shows that GNU version 11 is installed.

Note: For every GCC upgrade, one needs to modify the GNU_VER to match the current version.

Now, with the above Makevars in place. Let’s start a new R session and compile data.table from source.

1
2
3
remove.packages("data.table")
install.packages("data.table", type = "source",
    repos = "https://Rdatatable.gitlab.io/data.table")

To check if we have successfully compiled data.table with OpenMP support, let’s load the library.

1
2
library(data.table)
# data.table 1.14.0 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com

The message above shows that our data.table is working in multithreaded mode and is currently using 4 threads. To change the number of threads used by data.table and to make it persistent for every session, edit the ~/.Rprofile and include the following command.

1
2
# data.table configuration
data.table::setDTthreads(4)

Refs

The Makevars configuration was adapted from the following gist.