Using data.table with OpenMP support
If you are struggling with large data sets in R, you might benefit from a performance boost by using data.table
. However, when loading data.table
, at least on macOS, you may receive a warning that no OpenMP support has been detected and that data.table
is operating in a single-threaded mode. This will limit the benefits of using data.table
in the first place by not taking full advantage of the underlying hardware.
|
|
What is the issue here?
OpenMP is an implementation of multithreading and the clang compiler that ships with Xcode on macOS lacks the support for OpenMP. Apple decided not to include the libomp.dylib
run-time library in their compiler which we can check by issuing the following command.
|
|
To restore support for OpenMP in clang, one way is to 1) install the latest official LLVM release and 2) instruct R to compile data.table
with OpenMP support using a Makevars
file. Other R packages which support OpenMP will also benefit from this upgrade.
How to proceed?
In macOS, It is highly recommended to use HomeBrew package manager to ensure both automation and reproducibility. First, we need to make sure that we have the latest version of Xcode installed.
|
|
Then, if HomeBrew is not already installed, we can install it using the following command (for more details check the HomeBrew website).
|
|
Now, it’s time to install the latest version of the LLVM compiler.
|
|
The installation of LLVM compiler is a keg-only, which in HomeBrew’s terms means that it will shadow the system’s compiler rather than overriding it. In other words, it will not symlink its binaries into /usr/local/
. So, to instruct R to build new packages using the newly installed compiler, one needs to create a Makevars
file and place it in ~/.R/Makevars
.
|
|
Let’s edit the Makevars
file to include the paths of the new compiler and the required compiler options to support OpenMP. For example, on my machine using macOS Big Sur Version 11.2.3 (at the time of writing this post), my Makevars
file looks as followed.
|
|
Notice that you can issue the following command xcrun -show-sdk-path
to get the path to the developer tools on your system. In addition, to get GNU_VER
, one can check the contents of the following directory /usr/local/lib/gcc/
which in my case shows that GNU version 11 is installed.
Note: For every GCC upgrade, one needs to modify the GNU_VER
to match the current version.
Now, with the above Makevars
in place. Let’s start a new R session and compile data.table
from source.
|
|
To check if we have successfully compiled data.table
with OpenMP support, let’s load the library.
|
|
The message above shows that our data.table
is working in multithreaded mode and is currently using 4 threads. To change the number of threads used by data.table
and to make it persistent for every session, edit the ~/.Rprofile
and include the following command.
|
|
Refs
The Makevars
configuration was adapted from the following gist.